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MPC601 


Addendum to 


PowerPC™ 601 RISC Microprocessor User’s 
Manual 


~ This addendum describes additions and corrections to the PowerPC 601 RISC 
Microprocessor User’s Manual. For convenience, this information is organized according 
to the chapters in the user’s manual; however, there is no attempt to provide a 
comprehensive list of text that is affected by this information. These changes will be 
incorporated in the first revision of the user’s manual. 


Section 1: Chapter 1, “Overview” 

This section describes additional information and corrections to Chapter 1, “Overview.” 
Section 

Number 

1.1.5 In the first sentence in this section replace Terabyte with Petabyte. 

1.3.8.4 Replace Figure 1-6 with the following: 


PowerPC is a trademark of Intemational Business Machines Corporation. 
This document contains information on a new product under development. Specifications and information herein are subject to change without notice. 


See disclaimers on the last page of this addendum. 
os maces ieee (MA) MOTOROLA EZ 


© MOTOROLA INC. 1993 6/93 
Instruction set and other portions hereof © International Business Machines Corp. 1993 REV 1 
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43.6V 
Figure 1-6. MPC601 Signal Groups 


1.3.6.2 Replace the last sentence of the fourth paragraph with the following: 


The processor ensures that the ITLB is consistent with the UTLB, and uses an 
LRU replacement algorithm when a miss is encountered. 


Section 2: Chapter 2, “Registers and Data Types” 


This section describes additional information and corrections to Chapter 2, “Registers and 
Data Types.” 


21 Replace Figure 2-1 with the following: 


ADDENDUM-2 Addendum to the PowerPC 601 RISC Microprocessor User's Manual MOTOROLA 









‘ USER PROGRAMMING User-Level SPRs 

MODEL SPRO 
SPAI 
SPRS 
spre 
ered et ine eae a SS ene 


Supervisor-Level SPRs 






















Condition 
Register 


SPR18 i 

SPRI9 

| SPR20 
0 31. COS! SPR21 
| 
: 
! 
! 
! 


0 31 


Floating Point 
Status and 
Control! 
Register 


FPSCR 


31 





SPR25 
SPR26 
SPR27 
SPR272 
SPR273 

5 SPR274 
SPR275 
SUPERVISOR PROGRAMMING SPR282 
MODEL SPR287 

SPR528 

SPR529 

Segment SPR530 

Registers SPR531 

SPR532 
SPR533 
SPR534 


(=) 


Machine State 
Register @ SPR535 
MSR SPR1008 

0 31 0 31 SPR1009 
SPR1010 

SPR1013 

SPR1023 





' MPC601-only registers. These registers are not necessarily supported by other PowerPC processors. 

2 These registers may be implemented differently on other PowerPC processors. The PowerPC architecture defines two sets of 
BAT registers—eight IBATs and eight DBATs.The MPC601 implements the IBATs and treats them as unified BATs. 

? The RTCU and RTCL registers can be written only in supervisor mode, in which case SPR20 and SPR21 are used. 

4 The DEC register can be read by user programs by specifying SPR6 in the mfspr instruction (for POWER compatibility). 


Figure 2-1. Programming Model—Registers 
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2.1 


2.1 


2.2.4.1 


220 


Dinas 


2.3.3.10 
2 airet 


2a 512A 


The mechanism referred to for accessing SPRs is the set of Move to/from SPR 
instructions (mtspr and mfspr). These instructions are commonly used to 
access certain registers, while other SPRs may be more typically accessed as 
the side effect of executing other instructions. 


The MSR register is 64 bits wide in 64-bit implementations and is 32 bits wide 
in 32-bit implementations. 


Replace the first paragraph in this section with the following: 


In most integer instructions, when the Rc bit is set, the first three bits in CRO 
are set by an algebraic comparison of the result to zero; the fourth bit of CRO 
is copied from the XER[SO] bit. The addic., andi. and andis. instructions set 
these four bits implicitly. These bits are interpreted as follows—if any portion 
of the result (the 32-bit value placed into the target register) is undefined, the 
value placed into the first three bits of CRO is undefined. 


The mechanism referred to for accessing SPRs is the set of Move to/from SPR 
instructions (mtspr and mfspr). These instructions are commonly used to 
access certain registers, while other SPRs may be more typically accessed as 
the side effect of executing other instructions. 


In user-level access, RTCU and RTCL are read-only. The SPR numbers for the 
RTCU and RTCL differ depending upon whether the mtspr or mfspr 
instruction is used. For the mfspr instruction, RTCU is SPR4 and RTCL is 
SPRS. For the mtspr instruction, RTCU is SPR20 and RTCL is SPR21 
(supervisor-level access only). 


The MSR is not an SPR and should not be included in Table 2-15. 


The RTCU and RTCL registers can be written to only in supervisor mode and 
the mtspr instruction requires a different SPR encoding. For the mtspr 
instruction, RTCU is SPR20 and RTCL is SPR21. 


The PowerPC architecture defines the DEC register as supervisor-only access 
for both reads and writes. SPR22 is used for both reads and writes. The 
POWER architecture provides user-level read access, using SPR6. To ensure 
compatibility with subsequent PowerPC processors, the mfspr instruction 
should not be used in user-level. 


The PVR is a read-only register that cannot be modified. 


The HIDO register is set to x'8001 0080' by the hard reset operation. However, 
the state of the EMC bit depends on the results of the power-on diagnostics for 
the main cache array. This bit is set if the cache fails the built-in self test during 
the power-on sequence. 


Checkstop enable bits can be set or cleared without restriction. If a checkstop 
source bit is set, itcan be cleared; however, if the corresponding checkstop 
condition is still present on the next clock, the bit will be set again. A checkstop 
source bit can only be set when the corresponding checkstop condition occurs 
and the checkstop enable bit is set; it cannot be set via an mtspr instruction. 
That is, you cannot manually cause a checkstop condition. 
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2o.b2.2 


2.4.3 


MOTOROLA 


Note that when HID1[8-—9] = 10, the trap address of x'2000' has a base address 
indicated by the setting of MSR[IP]. This mode is valid for address 
comparisons and may produce unpredictable results when used with the HID 
single-instruction step mode. 

Replace sentence 2 of paragraph 2 with the following: 

All transfers of individual scalars between registers and storage are of double 
words. A subset of the 64-bit scalar (e.g., a byte) is not addressable in storage. 
As a result, to access any subset of the bits of a scalar, the entire 64-bit scalar 
must be accessed, and when a storage location is read, the 64-bit value returned 
is the 64-bit value last written to that location. 

The following example shows how the byte ordering is changed from big- to 
little-endian mode by setting HIDO[28] ( refers to the address): 


<msr[ee] is off (zero) > 


n sync [Instructions 

nt+4 — sync laccessed in 

n+8 sync Ibig-endian mode 
n+c mtspr hid0(28) { 

n+10 syne IInstructions 

n+14 — syne laccessed in 

n+18 sync llittle-endian mode 


The same instruction sequence can be used to go from little- to big-endian 
mode by clearing HIDO[28]. 


Little-Endian Address Manipulation 


In little-endian operations, the three least significant bits of an address are 
manipulated as described in Chapter 2, “Registers and Data Types,” to provide 
the appearance of a little-endian memory to the program for aligned loads and 
stores, as follows: 


New_addr(29) <- EA(29) xor (word | half | byte) 

New_addr(30) <- EA(30) xor (half | byte) 

New_addr(31) <- EA(31) xor (byte) 

The physical address used for a access generated by a load or a store to an 
operand that is less than a double word is modified as indicated. Addresses for 


aligned double word accesses and cache control operations are not modified 
since the endian mode has no effect on aligned accesses greater than one word. 


The DAR and SRRO will contain the program address (or the next sequential 
address, as appropriate) after all exceptions. If the processor is in little-endian 
mode, it will be a modified address. If the processor is in big-endian mode, the 
address is unmodified. 
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2.4.4 


2.4.5 


The T bit does not affect address manipulation or the detection of alignment 
exception conditions. Therefore I/O interface controller operations and BUID 
x'07F' segments receive the modified address. The ecowx and eciwx 
instructions are treated as no-ops if the T bit is set regardless of whether the 
MPC601 is in little-endian mode. 


Because the MPC601 defines a cache block as 32 bytes, bits 27-31 of the 
address are not used for snooping. The program address should be specified, 
when an address is loaded into HID2 or HIDS. That is, if the processor is in 
little-endian mode, a little-endian address should be specified, and if the 
processor is in big-endian mode, a big-endian address should be specified. 


Little-Endian Alignment Exceptions 


Additional alignment exception conditions can occur when the processor is in 
little-endian mode. 


Load/store multiple operands (regardless of EA) 


« Imw stmw 
° Iscbxx stswi 
° Iswi stswx 
° Iswx 


The new alignment exception conditions are prioritized with other alignment 
exceptions ahead of data access exceptions. For more information see Section 
2.4.6.2 “Misaligned Scalars.” 


Little-Endian Instruction Fetching 


In little-endian mode, instructions are fetched in big-endian order; however, 
the instructions are swapped within a double word before being passed to the 
instruction queue, thus putting the instructions in little-endian order for 
execution. On exceptions, the MPC601 reports the correct effective address (as 
defined by the programming model or computed by a storage access 
instruction) regardless of the endian mode selected. 


The second line of the program example is incorrect. Replace 
double b: /* x'212223242225262728' doubleword */ 

with the following: 

double b: /* x'2122232425262728' doubleword */ 


The MPC601 big- and little-endian mode operation differs from the PowerPC 
architecture in the following ways: 


¢ Choice of big- or little-endian modes is provided through HIDO[LM]—bit 28 
of HIDO. The PowerPC architecture defines two bits in the MSR for this 
purpose. 
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¢ The basic mode switching sequence requires three sync instructions followed 
by the mtspr access to HIDO[28], followed by three more sync instructions. 
This sequence should be used whenever the state of this bit is changed. 


¢ External and decrementer interrupts should be disabled before executing the 
sequence. 


¢ The starting address of the sequence does not matter; however, the sequence 
cannot cross a protection boundary. 


¢ In some cases the mtspr access to HIDO[LM] can occur twice depending on 
the alignment of the instruction. 


¢ In some cases not all of the sync instructions will actually be executed, 
depending on the starting address of the sequence. 


¢ Although HIDO[LM] can be switched dynamically, there are certain 
constraints (such as turning off translation and emptying the memory queues) 
that must be considered before the bit can be switched. Note that,when 
switching modes between tasks, this code sequence may not allow the 
MPC601 to operate at an optimal performance level. 


Section 3: Chapter 3, “Addressing Modes and 


Instruction Set Summary” 


This section describes additional information and corrections to Chapter 3, “Addressing 
Modes and Instruction Set Summary.” 


Section 
Number 


3i1:2 


3: 


we) 


4.2 


we) 
iN 
be 


MOTOROLA 


The first sentence in this section should include the isync instruction but should 
not include the mtmsr instruction. 


In Table 3-9, in the descriptions of the Shift Left Word, Shift Right Word, and 
Shift Right Algebraic Word instructions the number of bits specified by rB 
should be rB(27—31) instead of rB(26—31). 


The PowerPC architecture specifies that for the two floating-point convert to 
integer instructions, fctiw and fctiwz, both FPSCR(VXCVI) and 
FPSCR(VXSNAN) are set when the input operand is an SNaN. The MPC601 
sets only FPSCR(VXCVI). 


Add the following information: 


In future implementations, the load/store multiple instructions are likely to 
have greater latency and take longer to execute, perhaps much longer, than a 
sequence of individual load or store instructions that produce the same results. 
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Add the following information: 


In future implementations, the integer move string instructions are likely to 
have greater latency and take longer to execute, perhaps much longer, than a 
sequence of individual load or store instructions that produce the same results. 


Add the following information: 


The paired use of the Iwarx and stwex. instructions allows programmers to 
emulate common semaphore operations such as test and set, compare and 
swap, exchange memory, and fetch and add. 


The concept behind this part of the architecture is that a processor may load a 
semaphore from storage, compute a result based on the value of the semaphore, 
and conditionally store it back to the same location. The conditional store is 
performed based upon the existence of a reservation established by the 
preceding Iwarx instruction. If the reservation exists when the store is 
executed, the store is performed and a bit is set in the condition register. 


If the reservation does not exist when the store is executed, the target storage 
location is not modified and a bit in the condition register is cleared. The lwarx 
and stwcx. primitives allow software to read a semaphore, compute a result 
based on the value of the semaphore, store the new value back into the 
semaphore location only if that location has not been modified since it was first 
read, and determine if the store was successful. 


If the store was successful, the sequence of instructions from the read of the 
semaphore to the store that updated the semaphore appear to have been 
executed atomically (i.c., no other processor or mechanism modified the 
semaphore location between the read and the update), thus providing the 
equivalent of a real atomic operation. However, other processors may have 
read from the location during this operation. 


The reservation set by an Iwarx instruction can be cleared by the following 
conditions: 


¢ The processor having the reservation executes a store conditional instruction 
to any address. 


¢ Another device executes any store instruction to any address in the 32-byte 
sector associated with the reservation. 


¢ The processor with the reservation takes any exception. 

¢ The processor with the reservation executes an sc instruction. 

¢ The processor with the reservation executes a trap instruction that takes a 
program exception. 


In a uniprocessor system, a program that modifies instructions it intends to 
execute must execute an isync instruction to ensure that all modifications are 
made visible to the instruction queue. Note that additional instructions are 
required for other PowerPC processors. 
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3.5.10.1 


3.6.1.1 
3.6.1.2 
3.6.1.3 
3.6.1.4 
3.6.3 


3.8.4 


Replace the first sentence of this section with the following: 


The steps for converting a floating-point value from the double-precision 
register format to single-precision memory format are as follows: 


Note that the LI field is appended with b'00" prior to the addition. 
Note that the BD field is appended with b'00' prior to the addition. 
Note that the LI field is appended with b‘00' prior to the addition. 
Note that the BD field is appended with b'00' prior to the addition. 


The PowerPC architecture defines the bectr instruction with the “decrement 
and test CTR” (BO, = 0) option as an invalid form, and attempting to execute 
such an instruction causes boundedly undefined results. However, the MPC601 
tests the count register for 0 and branches based on the result. Instruction 
fetching is directed to the address specified in the non-decremented version of 
the count register. 


The RTCU and RTCL registers can be read in user level and can be written to 
in supervisor level. The SPR encodings for reading the RTCU and RTCL 
registers are 4 and 5, respectively (regardless of whether the processor is in 
user- or supervisor level). The SPR encodings for writing RTCU and RTCL are 
20 and 21, respectively. 


The essence of the tlbie instruction may be broadcast onto the MPC601 bus 
interface. This function is enabled by setting HID1[17]. 


Section 4: Chapter 4, "Cache and Memory Unit 


Operation" 

This section describes additional information and corrections to Chapter 4, “Cache and 

Memory Unit Operation.” 

4.8 Delete the next to the last paragraph in this section. 

4.8.8 Replace the second paragraph with the following: 
The debi instruction cannot be used to invalidate instructions in the cache of 
the MPC601. This instruction may have the effect of unmodifying data storage 
depending upon timing, exceptions, and other events. 

4.11 In row #12 on page 4-28, “Four-beat write (quadword 2)” should update the 
sector status. The Current State column correctly contains an x but the Next 
State Column specifies no change. The next state should be M. 
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Section 5: Chapter 5, "Exceptions" 


This section describes additional information and corrections to Chapter 5, “Exceptions.” 


Section 
Number 


=F 


5.1.1.3 


5.4.4 


5.4.5 


5.4.6 


5.4.6.1.2 


5.4.10 


The last bullet in the entry for the instruction access exception in Table 5-1 
should read as follows: 


If the K bits in the segment register and the PP bits in the PTE or BAT are set 
to prohibit read access, instructions cannot be fetched from this location. 


The second bulleted item in this section should read as follows: 


SRRO addresses either the instruction that would have completed or some 
instruction following it that would have completed if the exception had not 
occurred. 


The following information should replace the appropriate bit descriptions for 
SRRI in Table 5-12. 


3 Cleared. Note that the PowerPC architecture defines this as set if the fetch access was to an I/O 
controiler interface segment (SR[T]}=1). Note that this condition causes SRR1[0-15] to be cleared 
in the MPC601. 

10 Cleared 


In early versions of the MPC601 (processor revision level x'0000’), the external 
interrupt is a level-sensitive signal and should be held active until reset by the 
interrupt service routine. Phantom interrupts due to phenomena such as 
crosstalk and bus noise should be avoided. 


In later versions of the MPC601 (processor revision level x‘0001' and higher), 
the MPC601 is guaranteed to detect an external interrupt when the INT signal 
is held active for at least two clock cycles. The MPC601 is guaranteed to ignore 
the INT signal if it is held for less than one clock cycle. 


The first DSISR value listed in Table 5-14 should be as follows: 


The following should be added to Table 5-14: 


DAR Set to the EA of the data access as computed by the instruction causing the 


alignment exception. 


Replace the third bulleted item with the following: 


The Iwarx/stwex./lscbx instructions that map into an I/O controller interface 
segment always cause a data access exception. However, if the instruction 
Crosses a segment boundary an alignment exception is taken instead. 


Note also that unlike exceptions that occur with memory accesses, loads and 
both loads and stores with update to I/O controller interface segments cause the 
target register to be updated, regardless of whether an exception is taken. 


ADDENDUM-10 Addendum to the PowerPC 601 RISC Microprocessor User's Manual MOTOROLA 








5.4.12 Replace this section with the following: 
Run Mode/Trace Exception (x'02000') 
The MPC601 defines an implementation-specific exception called the run mode exception. 
This exception is taken by the MPC601 under the following circumstances: 
¢ Instruction address compare 
¢ Branch target address compare 


¢ Trace mode (MSR[SE] is set)—When an instruction clears MSR[SE], trace mode 
ends immediately. Note that other PowerPC processors implement a separate trace 
exception at vector x'00D00". 


Note that this exception may not be implemented by other PowerPC processors, and that 
this exception can be enabled and disabled using bits 8 and 9 in HID]; the exception is 
enabled when HID1[8,9] = b'01'. When this exception occurs, the registers are set as 
indicated in Table 5-24. 


Table 5-24. Run Mode Exception—Register Settings 
a 


SRRO Set to the address of the instruction that causes the run mode exception 
SRR1 Loaded from bits 0-31 of the MSR 


aa: : Zz 


PR Oo FE1 0 
FP1 0 EP Value is not altered 

The run mode is determined by the settings of HID1[1—3]. These settings are defined in 

Table 5-25. 















ME Value is not altered IT 0 
FEO 0O DT oO 


Table 5-25. Run Modes Setting 


eo 
es ee 
ee 
: 







Undefined. Do not use 
Full instruction address compare 
Full branch target address compare 
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Table 5-26 describes the run modes. 


Table 5-26. Run Modes Description 
a 
Normal run mode No address breakpoints are specified and the MPC601 processes zero to 
three instructions per cycle. 


Single instruction step mode In single instruction step mode, the fetcher processes one instruction at a 
time. After an instruction is processed and the chip quiesces, the appropriate 
break action is performed. Note that this mode is distinct from the trace 
exception, which depends on the setting of MSRISE]. 






















Limited instruction address 
compare mode 


The MPC601 runs at full speed until the EA of the instruction in the lowest 
position in the instruction queue (IQ0) matches the one specified in HID2. At 
this point the appropriate break action is performed. This is a limited compare 
in that branches and floating-point operations and the addresses associated 
with thom may never be detected. 





















Full instruction address 
compare mode 


In full instruction address compare mode, processing proceeds out of IQ0. 
When the EA in HID2 matches the EA of the instruction in IQ0, the 
appropriate break action is performed. Unlike the limited instruction address 
compare mods, all instructions pass through the 1Q0 in this mode. That is, 
instructions cannot be folded out of the instruction stream. 





Full branch target address 
compare mode 


This mode is similar to full instruction address compare mode except that the 
branch target is compared against HID2. When addresses match, the 

appropriate break action is taken. This allows the programmer to see how a 
program got to an address. This mode can be used with b, be, ber, and bec 
instructions. 







When the trace exception is enabled, (MSR[SE] is set), a trace interrupt is taken after each 
instruction that completes without causing an exception or context change (such as an sc, 
rfi, or a load instruction that causes an exception). MSR[SE] is cleared when the trace 
exception is taken. In the normal use of this function, MSR[SE] is restored when the 
exception handler returns to the interrupted program using an rfi instruction. 


Register settings for the trace mode are described in Table 5-27. 


Table 5-27. Trace Exception—Register Settings 
a 
SRRO Set to the address of the next instruction to be executed in the program for which the trace 
exception was generated 
SRRI1 0-15 Cleared 
16-31 Loaded from bits 16-31 of the MSR 










MSR EE 0 SE 0 
PR 0O FE1 0 

: FP1 0 EP _ Value is not altered 
ME _ Value is not altered IT 0 


FEO 0 DT 0O 
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When a run mode or trace exception is taken, instruction execution resumes as offset 
x'02000' from the base address indicated by MSR[EP]. 


Section 6: Chapter 6, “Memory Management Unit” 


This section describes additional information and corrections to Chapter 6, “Memory 
Management Unit.” 


6.1.8 


6.7.6 


6.9.1.5.2 


6.9.2 


6.9.2 


The first two bullet items under constraints enforced for instruction prefetching 
should be deleted. 


The debt/debtst instruction branch of Figure 6-10 should say “Abort Access” 
instead of “Abort Translation.” 


The Hash Value 2 shown in Figure 6-22 shows an extra 4 bits (1111) that should 
be deleted. The Hash Value 2 should be replaced with the following: “101 1000 
0000 0001 1001.” 


Steps 1 and 5 of the page table search operation imply that PTEs are read into 
the processor as single-beat read operations. In reality, the MPC601 performs 
a burst read operation at the PTE address (to load a sector of the on-chip cache) 
and optionally performs a second burst read operation to fill the cache line. 
However, the PTEs are read from the cache and compared with the virtual 
address information one at a time. 


At the top of Figure 6-23, the fetch of a PTE is described as a single-beat read 
from PA. This should be replaced with a box showing that four PTEs are burst 
in at once, and four more PTEs may be burst in to fill a cache line. 


Section 7: Chapter 7, "Instruction Timing” 


This section describes additional information and corrections to Chapter 7, “Instruction 


Timing.” 


7.1 


MOTOROLA 


Replace Figure 7-1 with the following: 


CLOCK 1 (STAGE 2) A 


Figure 7-1. Pipelined Execution Unit 
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7.3.3 Replace the last two paragraphs in this section with the following: 


Double-precision floating-point multiply instructions spend multiple clock 
cycles in the decode and execute stages of the FPU. However, the fmul 
instruction is broken down into two parts (which in the FPU pipeline appear to 
be two instructions). This allows the instruction to occupy two stages in the 
FPU simultaneously. The first part of the instruction can begin FPU execute 1 
stage as the second part enters the decode stage. Likewise, when the first part 
of the instruction enters FPU execute 2 stage, the second part enters execute 1 
stage. 


This self-pipelining reduces the latency to five cycles and improves the 
throughput. For example, a series of fmul instructions would have a 
throughput of one instruction every two cycles. 


Section 8: Chapter 8, "Signal Descriptions" 


This section describes additional information and corrections to Chapter 8, “Signal 
Descriptions.” 


Section 
Number 


8.1 Replace Figure 8-1 with Figure 1-7 (shown on page Addendum-2 of this 
addendum). 


8.2.4.1.2 Replace the first two entries in Table 8-1 with the following: 
Special operations: This signal is asserted whenever a bus transaction is run in response to a 


iwarx/stwex. instruction pair, a TLBI (translation lookaside buffer invalidate) operation, or either an 
eciwx or ecowx instruction. 


Read (or write) operations: This signal indicates whether the transaction is a read (TT1 high) or a write 
(TT1 low). This assumes that the transaction is not address-only. 





8.2.4.2.1 Replace the last paragraph of the State Meaning section with the following: 


For external control instructions (eciwx and ecowx), TSIZO—TSIZ2 are used to 
output bits 29-3 1 of the external access register (EAR), which are used to form 
the resource ID (TBSTIITSIZO-TSIZ2). 


8.2.4.3.1 Replace the first paragraph of the State Meaning section with the following: 
Asserted—Indicates that a burst transfer is in progress. 

8.2.4.4 Replace the first sentence with the following: 
The transfer code (TCO-TC1) consists of two output signals on the MPC601. 
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8.2.4.6 


8.2.4.9 


8.2.6.1 


8.2,7,2.2 


8.2.9.1 


8.2.9.6 


8.2.11 


MOTOROLA 


Replace the first entry in Table 8-4 with the following: 


Assartion depends on whether the current transaction is a read or write operation; therefore, TCO 
should be used with TT1. On a read operation, TCO asserted indicates the transaction is an instruction 


fetch operation; otherwise, the read operation is a data operation. 
Asserting TCO for write operations indicates the cache sector associated with a write is being 
invalidated; TCO negated indicates the cache sector associated with a write is not being invalidated. 





Substitute the following for the State Meaning entry: 


Asserted—Indicates that a single-beat transaction is write-through, reflecting 
the value of the W bit for the block or page that contains the address of the 
current transaction. For burst writes, this indicates that the write is the result of 
a dcbf or debst instruction. 


Negated—Indicates that a transaction is not write-through. For bursts it is 
negated for cast-outs and snoop pushes. 


Add the following sentence to the first paragraph: 

This pin must be enabled by setting HIDO[31] if it is to be used. 

The Timing Comments should read as follows: “Assertion/Negation—Miust be 
valid throughout the entire address tenure.” 

Substitute the following for the Asserted information in the State Meaning 
section: 


Asserted—Indicates that the MPC601 may, with the proper qualification, 
assume mastership of the data bus. The MPC601 derives a qualified data bus 
grant when DBG is asserted and DBB, DRTRY, and ARTRY are negated; that 
is, the data bus is not busy (DBB is negated), there is no outstanding attempt to 
retry the current data tenure (DRTRY is negated), and there is no outstanding 
attempt to perform an ARTRY of the associated address tenure. 


Substitute the following for the next-to-last sentence in the State Meaning 
section: 


Detected even parity causes a checkstop if data parity errors are enabled in the 
HID register. 


Replace the First paragraph of the State Meaning section with the following: 


Asserted—The MPC601 latches the interrupt condition if the MSR(EE) bit is 
set and ignores the interrupt condition otherwise. To guarantee that the 
MPC601 will take the external interrupt, the INT pin must be held active until 
the MPC601 takes the interrupt; otherwise, the MPC601 may or may not take 
an external interrupt, depending on whether MSR[EE] bit was set while the 
INT signal was held active. 


Add the following sentence to the first paragraph: 
Note that systems that do not use this signal should tie it low. 
This section should be deleted. 
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8.2.12.3. Replace Figure 8-6 with the following: 


1ofif]2f3 {4,5 |] 6], 7] 8 | g | 10 [{ 11 | 
2X_PCLK 





P_CLK: 





Bus Transition : t 
‘Delay of iver eurnut 


Figure 8-6. Generation of Bus Transactions—Logical Bus Clock = 1/2 P_CLK 


Section 9: Chapter 9, "System Interface Operation" 


9.1.2 Replace the last sentence in the second bulleted item with the following: 


The update of the other sector can be disabled by setting bits in the HIDO 
register. HIDO[DRF], bit 26, can be used to disable fetches and HIDO[DRL], 
bit 27, can be used to disable loads and stores. 

9.2 Replace the second paragraph with the following: 


Figure 9-3 shows that the address and data tenures are distinct from one 
another and that both consist of three phases—arbitration, transfer, and 
termination. Address and data tenures are independent (indicated in Figure 9-3 
by fact that the data tenure begins before the address tenure ends), which 
allows split-bus transactions to be implemented at the system level in 
multiprocessor systems. Figure 9-3 shows a data transfer that consists of a 
single-beat transfer of as many as 64 bits. Four-beat burst transfers of 32-byte 
cache sectors require data transfer termination signals for each beat of data. 


Delete the second paragraph. 
Substitute the following row in Table 9-3. 


First transfer: 
two bytes 


9.3.2.4 Replace the last sentence of the second paragraph with the following: 


TCO negated indicates the write is not invalidating any cache sector (for 
example, write-through or cache-inhibited write operations.) 
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9.3.3 The second sentence of the second paragraph should read as follows: 


After ARTRY and SHD are asserted, they will be three-stated for two bus 
cycles and the system is responsible for precharging both ARTRY and SHD 
signals. 

9.3.3 Add the following sentence to the end of the fourth bulleted item: 


The override mode uses the HP_SNP_REQ signal to determine if the snoop 
queue is to be used. This mode is enabled by setting HIDO[31]. 


9.4.2 Replace the first sentence of the third paragraph with the following: 


The type of transaction initiated by the MPC601 depends on whether the code 
or data is cacheable and, for store operations, whether the cache is operated in 
write-back or write-through mode which software controls at either the page or 
block basis. 

9.4.3 Replace the last sentence with the following: 
ARTRY can also terminate a data bus transaction. For burst transactions, this 
ARTRY must occur no later than the cycle of the second TA. For single-beat 
transactions, it must occur no later than the cycle following TA. In either case, 
the ARTRY must be for the address bus tenure associated with the data bus 
tenure. 


9.4.3.1 Replace Figure 9-10 with the following: 









AACK 


Figure 9-10. Normal Single-Beat Read Termination 
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9.5 


9.6.4 


9.6.4 


9.10 


Replace Figure 9-11 with the following: 





Figure 9-11. Normal Single-Beat Write Termination 


The clock signals at the bottom of the figures in this section should be ignored. 
Replace the first two paragraphs with the following: 


This section shows timing diagrams for various scenarios. Figure 9-16 
illustrates the fastest single-beat reads. This figure shows both minimal latency 
and maximum single-beat throughput. By delaying the data bus tenure, the 
latency increases, but, because of split-transaction pipelining, the overall 
throughput is not affected unless the data bus latency causes the third address 
tenure to be delayed. 


Note that all bidirectional signals go to high-impedance between bus tenures. 
Replace the last sentence in the fourth paragraph with the following 


The MPC601 involved in this transaction, however, does not initiate any other 
I/O controller load or store operations once the first I/O controller interface 
operation has begun address tenure; however, if the I/O operation is retried, 
other higher-priority operations can occur. 


Replace the last sentence of the last paragraph with the following: 


If the TEA signal is not asserted with each tenure of a given I/O controller 
interface operation, the result of the assertion of TEA is unpredictable. The 
MPC601 may take a machine check exception or cause a checkstop condition. 


Add the following note after step 5: 
Note that steps 4 and 5 can occur in either order. 
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Section 10: Chapter 10, “Instruction Set” 

This section describes additional information and corrections to Chapter 10, “Instruction 
Set.” 

Imw Add the following information: 


In future implementations, this instruction is likely to have greater latency and 
take longer to execute, perhaps much longer, than a sequence of individual load 
instructions that produce the same results. 


Ifsu The reference in the description of this instruction should be to Section 3.5.9.1, 
“Double-Precision Conversion for Floating-Point Load Instructions.” 


Ifsux The reference in the description of this instruction should be to Section 3.5.9.1, 
“Double-Precision Conversion for Floating-Point Load Instructions.” 


lfsx The reference in the description of this instruction should be to Section 3.5.9.1, 
“Double-Precision Conversion for Floating-Point Load Instructions.” 


Iswi Add the following information: 


In future implementations, this instruction is likely to have greater latency and 
take longer to execute, perhaps much longer, than a sequence of individual load 
instructions that produce the same results. 


mffs This instruction is executed in the FPU rather than the IU. 
Iswx Add the following information: 


Under certain conditions (for example, segment boundary crossings) the 
alignment error handler may be invoked. For additional information about 
alignment exceptions, see Section 5.4.6, “Alignment Exception (x’00600’) 


In future implementations, this instruction is likely to have greater latency and 
take longer to execute, perhaps much longer, than a sequence of individual load 
instructions that produce the same results. 


mfspr Replace Table 10-4 with the following: 
Table 10-4. SPR ee for mfspr 


Register Name 


ee 00000 | 00000 | Sa 


fe 00000 00110 
ey 00000 01000 
Ce Ce 
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Table 10-4. SPR Encodings for mfspr (Continued) 


Decimal } SPRIS-9]_ SPR[0-4] 







fers [otooo [woot [senci ‘| Sanson 
BAT1U Supervisor 
Foe [rn [roe [mance | Sipenior_| 


1023 41411 41441 PIR (HID15) 


'Note that the order of the two 5-bit halves of the SPR number is reversed compared with 
actual instruction coding. 


If the SPR field contains any value other than one of these implementation-specific 
values or one of the values shown in Table 3-40, the instruction form is invalid. 


SPR[0}=1 if and only if the register is being accessed at the supervisor level. Execution of 
this instruction specifying a defined and supervisor-level register when MSR[PR]=1 
results in a privilege violation type program exception. 


For mtspr and mfspr instructions, the SPR number coded in assembly language does 


74 
75 
82 
87 
29 
30 
31 
32 
33 
34 
35 


2 
2 
2 
2 
5 
5 
5 
5 
5 
5 
5 
1 
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not appear directly as a 10-bit binary number in the instruction. The number coded is split 
into two 5-bit halves that are reversed in the instruction, with the high-order 5 bits 


appearing in bits 16-20 of the instruction and the low-order 5 bits in bits 11-15. 
SPR encodings for DEC, MQ, RTCL, and RTCU are not part of the PowerPC 


architecture. 


2On the MPC601, the mfspr instruction for the RTCU and RTCL registers must use these 
encodings (SPR4 and SPRS§, respectively) regardless of whether the processor is in 
supervisor or user mode. The mtspr instruction, which is supervisor-only for the RTCU 


and RTCL registers, must use the SPR20 and SPR21 encodings, respectively. 


3Read access to the DEC register is supervisor-only in the PowerPC architecture, using 
SPR22. However, the POWER architecture allows user-level read access using SPR6. 
Note that the SPR6 encoding for the DEC will not be supported by other PowerPC 
processors. 


mtspr 


MOTOROLA 


Replace Table 10-5 with the following: 


Table 10-5. SPR Encodings for mtspr 


1 


Ce 
Cn 
fs foowe—[oioor [ern ur 
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Table 10-5. SPR Encodings for mtspr (Continued) 


Register 
SPR[0-4] heme 


10000 Supervisor 
10000 Supervisor 
10000 Supervisor 
10000 Supervisor 


1008 11111 10000 Checkstop Supervisor 
Register 
(HIDO) 
11111 Debug Mode 
Register 
(HID1) 


1010 11111 IABR (HID2) | Supervisor 
11111 10101 DABR (HIDS5) Supervisor 


11111 11171 PIR (HID15) Supervisor 


‘Note that the order of the two 5-bit halves of the SPR number is reversed compared with 
actual instruction coding. 


If the SPR field contains any value other than one of these implementation-specific 
values or one of the values shown in Table 3-40, the instruction form is invalid. 


SPR{0]=1 if and only if the register is being accessed at the supervisor level. Execution of 
this instruction specifying a defined and supervisor-level register when MSR[PR]=1 
results in a privilege violation type program exception. 

For mtspr and mfspr instructions, the SPR number coded in assembly language does 
not appear directly as a 10-bit binary number in the instruction. The number coded is split 
into two 5-bit halves that are reversed in the instruction, with the high-order 5 bits 
appearing in bits 16—20 of the instruction and the low-order 5 bits in bits 11~15. 


SPR encodings for DEC, MQ, RTCL, and RTCU are not part of the PowerPC 
architecture, 













Supervisor 
















2On the MPC601, the mfspr instruction for the RTCU and RTCL registers must use these 
encodings (SPR4 and SPR§, respectively) regardless of whether the processor is in 
supervisor or user mode. The mtspr instruction, which is supervisor-only for the RTCU 
and RTCL registers, must use the SPR20 and SPR21 encodings, respectively. 


3Read access to the DEC register is supervisor-only in the PowerPC architecture, using 
SPR22. However, the POWER architecture allows user-level read access using SPR6. 
Note that the SPR6 encoding for the DEC will not be supported by other PowerPC 
processors. 
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stmw 


stswi 


stswx 


10.3 


Add the following information: 


In future implementations, this instruction is likely to have greater latency and 
take longer to execute, perhaps much longer, than a sequence of individual 
store instructions that produce the same results. 


Add the following information: 


In future implementations, this instruction is likely to have greater latency and 
take longer to execute, perhaps much longer, than a sequence of individual 
store instructions that produce the same results. 


Add the following information: 


In future implementations, this instruction is likely to have greater latency and 
take longer to execute, perhaps much longer, than a sequence of individual 
store instructions that produce the same results. 


The tlbiex instruction has been removed from the PowerPC architecture. and 
should be deleted from Table 10-6. 


Section 11: Appendixes 


This section describes additional information and corrections to the appendixes. 


Section 
Number 
App. A 
App. C 


App. D 


F.3.2 


MOTOROLA 


The slbiex and tlbiex instructions, which are not implemented in the MPC601, 
have been removed from the PowerPC architecture. 


The slbiex and tibiex instructions, which are not implemented in the MPC601, 
have been removed from the PowerPC architecture. 


The MPC601 takes an illegal instruction error exception for instructions that 
the PowerPC architecture defines as reserved, except for those POWER 
instructions that are implemented on the MPC601. 


Replace Round Integer(frac,gbit,rbit,xbit,round_mode) with the following: 
Round Integer(frac,gbit,rbit,xbit,round_mode) 


In this example, u represents an undefined hexadecimal digit. Comparisons 
ignore the u bits. 


inc <— 0 
If round_mode= b’00’ then 
Do 
If sign Il frac[64] I! gbit Il rbit Il xbit = b’ul luu’ then inc < 1 
If sign II frac(64] Il gbit Il rbit Il xbit = b’u011u’ then inc — 1 
If sign Il frac[64] I! gbit {I rbit l xbit = b’u0lul’ then inc — 1 
End 
If round_mode= b’10’ then 
Do 
If sign Il frac[64] Il gbit Il rbit Il xbit = b’Ouluu’ then ince-1 
If sign Il frac[64] | gbit Il rbit Il xbit = b’Ouulu’ then inc <— 1 
If sign Il frac[64] Il gbit Il rbit II xbit = b’Ouuul’ then inc — 1 
End 
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If round_mode= b’11’ then 
Do 
If sign II frac[64] II gbit Il rbit Il xbit = b’luluu’ then inc — 1 
If sign Il frac[64] I] gbit II rbit I] xbit = b’luulu’ then inc <— 1 
If sign I! frac[64] Il gbit Il rbit ! xbit = b’luuul’ then ince 1 
End 
frac[0-64] — frac[O-64] + inc 
FPSCR[FR] < inc 
FPSCR[FI] <— gbit | rbit | xbit 
Return 


App.H _ The following appendix should be added to the user’s manual. 


Appendix H 
MPC601 as a PowerPC Microprocessor 


The MPC601 processor is the first implementation of the PowerPC architecture. It offers a 
reliable platform for software and hardware developers to make products compatible with 
subsequent processors in the PowerPC family. In addition, the MPC601 provides 
extensions to the PowerPC architecture that allow it to function as a bridge from the 
POWER architecture. This appendix describes the POWER extensions as well as other 
differences between the MPC601 and the PowerPC architecture (PowerPC Architecture, 
First Edition). These differences can be categorized as follows: 


¢ POWER extensions—Additional functionality not defined in the PowerPC 
architecture. For example, the MPC601 implements many POWER instructions that 
do not have PowerPC equivalents. 


¢ Variances—MPC601 functionality that is implemented differently than as described 
in the PowerPC architecture. For example, there are several differences between the 
MPC601 MMU implementation and that specified by the PowerPC architecture. In 
general, these variances are not visible from the user level. 


¢ Implementation-dependent extensions—These include features that are not part of 
but are allowed by the PowerPC architecture. For example, the MPC601 provides a 
set of implementation-dependent registers (HIDs) to control hardware features such 
as parity checking and instruction address breakpoint that are beyond the 
specifications in architecture. Software should take appropriate precautions to 
control use of these features. 


* PowerPC optional features—These include optional features defined in the 
PowerPC architecture that are implemented in the MPC601. 


This appendix does not describe performance trade-offs allowed by the PowerPC 
architecture. For example, some implementations may provide more support for alignment 
than others and therefore they may require different amounts of assistance in the interrupt 
handler. 
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This appendix also does not describe variances built into the PowerPC architecture to 
provide some latitude in PowerPC implementations for handling reserved, invalid, and 
undefined conditions. These aspects are left intentionally undefined. Note that while the 
MPC601’s treatment of such aspects may be predictable, taking advantage of that behavior 
may cause software incompatibilities with other PowerPC implementations. 


Where applicable, a reference is given to the portion of the user’s manual that describes that 
functionality. 


Also, the tables in this appendix indicate the level of architecture at which the MPC601 
diverges. These levels are as follows: 


¢ PowerPC user instruction set architecture—Defines the base user-level instruction 
set, user-level registers, data types, floating-point exception model, memory models 
for a uniprocessor environment, and programming model for uniprocessor 
environment. 


The tables in this appendix identify differences with this part of the architecture by 
listing “User” in the “Level” column of the tables. 


¢ PowerPC virtual environment architecture—This describes the memory model for a 
multiprocessor environment, defines cache control instructions, and describes other 
aspects of virtual environments. Implementations that conform to the PowerPC 
virtual environment architecture also adhere to the PowerPC user instruction set 
architecture, but may not necessarily adhere to the PowerPC operating environment 
architecture. 


The tables in this appendix identify differences with this part of the architecture by 
listing “Virtual environment” in the “Level” column of the tables. 


¢ PowerPC operating environment architecture—This defines the memory 
management model, supervisor-level registers, synchronization requirements, and 
the exception model. Implementations that conform to the PowerPC operating 
environment architecture also adhere to the PowerPC user instruction set 
architecture and the PowerPC virtual environment architecture definition. 


The tables in this appendix identify differences with this part of the architecture by 
listing “Operating environment” in the “Level” column of the tables. 


H.1 POWER Extensions 


Table H-1 lists POWER functionality supported by the MPC601 that is not defined in the 
PowerPC architecture. POWER extensions include additional functionality not defined in 
the PowerPC architecture. 
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Table H-1. POWER Extensions 


The MQ register, provided for POWER Section 2.2.5.1, "MQ POWER 


compatibility, is not part of the PowerPC Register (MQ)” 
Virtual and POWER and 


architecture. 
operating Variance 
environment 


















Section 2.2.5.3, "Real- 
Time Clock (RTC) 
Registers" 


The MPC601 implements the real-time clock 
feature, including POWER registers RTCU and 
RTCL, to provide a time reference rather than the 
time base feature defined by the PowerPC 
architecture. 



































POWER and 
Variance 


User and 
virtual 
environment 







Section 2.3.3.4, 
"Decrementer (DEC) 
Register” 


In the MPC601 processor, the decrementer 
implementation uses the separate 7.8125 MHz 
RTC for its base frequency. Other PowerPC 
processors base the decrementer on the processor 
clock. 

























The decrementer register (DEC) in the MPC601 Section 2.3.3.4, | User and 
allows user-level read access, which is not "Decrementer (DEC) operating 
provided in the PowerPC architecture. Register" environment 










Section 3.7.1, "Move User and 
‘to/from Special Purpose operating 
Register Instructions” environment 


Because MPC601 supports the POWER registers, 
MQ, RTCU, and RTCL, instruction encodings to 
access them, mtspr and mfspr, are also provided. 










Appendix C, “PowerPC 
Instructions Not 
Implemented in MPC601” 


The MPC601 provides a group of instructions for 
compatibility with POWER. The relationship 
between the MPC601 and the PowerPC instruction 
sets are shown in Appendix C. The PowerPC 
architecture defines these instructions as reserved. 












H.2 Variances 


Variances include PowerPC functionality that is implemented in the MPC601 with some 
differences. In general, these variances are not visible from the user level. Table H-2 lists 
variances to the PowerPC architecture. 
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Table H-2. Variances 


The VXSOFT, VXSQRT, and NI bits (bits 21, 22, 
and 29, respectively) are not implemented in the 
MPC601 processor. 


If the Floating-Point Convert to Integer Word (fctiw) 
instruction results in a conversion exception, 
FPSCR[XCVI] is set causing FPSCR[VX] to be set. 
The PowerPC architecture specifies that when 
fctiw causes FPSCR[XCV]] to be set, FPSCR[XX] 
is not altered. The MPC601 may set both 
FPSCR[XX] and FPSCR[XCVI] in some 
circumstances. 


The architecture requires that both FPSCR[VXCVI] 
and FPSCR[VXSNAN] be set when the source 
operand of a fetiw is an SNAN. MPC601 sets only 
FPSCR[VXCVI]. 


PowerPC architecture defines the following bits in 
the machine state register (MSR) not implemented 
in the MPC601: 

Description 

Power management enable (POW) 

Interrupt little-endian mode (ILE) 

Branch trace enable (BE) 

Recoverable exception (RE) 

Little-endian mode (LE) 


The MPC601 provides a bit in an implementation- 
specific register (HIDO) for selecting between big- 
and little-endian modes. PowerPC architecture 
defines MSR[LE] for this purpose. 


The number, function, content, and format of the 
BAT registers implemented by the MPC601 is 
different than that specified by the PowerPC 
architecture. 


The MPC601 clears the reservation bit set by the 
execution of an Iwarx instruction when taking any 
type of exception. The PowerPC architecture 
defines that the reservation be cleared for a subset 
of exceptions. 


Because the MPC601 does not implement the 
MSR[RE] bit (recoverable exception bit), the 
operating system must use other criteria to 
determine if it is possible to recover from an 
asynchronous, imprecise interrupt. 


Instruction access exceptions due to instruction 
fetches from I/O controller interface segments do 
not set the SRR1[3] bit as defined in the PowerPC 
architecture. This condition clears SRR1[0-15]. 
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Section 2.2.3, "Floating- 
Point Status and Control 
Register (FPSCR)” 


Section 3.4.3, "Floating- 
Point Rounding and 
Conversion Instructions” 


Section 3.4.3, “Floating- 
Point Rounding and 
Conversion Instructions 


Section 2.3.1, "Machine 
State Register (MSR)” 


Section 2.4.3, "Byte and 
Bit Ordering” 


Section 2.3.3.11, "BAT 
Registers” 


Section 3.5.7, "Memory 
Synchronization 
Instructions” 


Section 5.4.1, "Reset 
Exceptions (x’00100’)"” 


Section 5: “Chapter 5, 
"Exceptions" of this 
addendum 


User and 
operating 
environment 


User and 
operating 
environment 


User and 
operating 
environment 


Operating 
environment 


Operating 
environment 


Operating 
environment 


User and 
operating 
environment 


Operating 
environment 


Operating 
environment 
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Variance 


Variance 


Variance 


Variance 


Variance 
Variance 


Variance 


Variance 
Variance 
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Table H-2. Variances (Continued) 


The non-IEEE mode bit, FPSCRINI], is reserved in | Section 5.4.7.1, "Floating- | Operating Variance 
the MPC601. All floating-point results are Point Enabled Program environment 
consistent with IEEE standards. Exceptions” 


The MPC601 maps I/O controller interface error Section 5.4.10, "YO Operating Variance 
conditions to I/O controller interface exceptions Controller Interface Error | environment 

instead of to the data access exception vector Exception (x’00A00’)” 

specified by the PowerPC architecture. 


Section 6.1.3, "Address Operating Variance 
Translation Mechanisms” | environment 














































Operating 
environment 


Unlike exceptions that occur with memory 
accesses, loads, loads with update, and stores with 
update to I/O controller interface segments cause 
any target registers to be updated, regardless of 
whether an exception is taken. 


Section 5: “Chapter 5, 
“Exceptions” of this 
addendum 






























Section 5.4.12, "Run 
Mode/Trace Exception 
(x'02000’)” 


Operating 
environment 


The MPC601 does not implement the trace 
exception as a separate exception as is defined in 
the PowerPC architecture (x’00D00'). The MPC601 
vectors trace exceptions to the run-mode/trace 

exception (x’02000’'). 
















The MPC601 allows access to the I/O controller 
interface regardless of the setting of MSR[DT]. The 
PowerPC architecture does not allow these 

accesses when MSRI[DT] is cleared. 

































Variance 


H.3 Implementation-Dependent Extensions 


Implementation-dependent extensions include features that are not part of, but are allowed 
by, the PowerPC architecture. Note that there are a number of such extensions that are 
described throughout the user’s manual, and there is no attempt to list them exhaustively in 
this appendix. Table H-3 provides a brief list of key implementation-dependent extensions 
supported by the MPC601. 


The MPC601 does not implement the PowerPC 
tlbsync instruction, but instead requires the use of 
a sync instruction to synchronize the completion of 
a broadcast tlbie instruction. 


Section 10.3, 
"Instructions Not 

Implemented by the 
MPC601” 


Section 6.1.8, "Effects of 
Instruction Prefetch on 
MMU” 


Operating 
environment 

















PowerPC architecture defines a "Guarded" 
memory attribute used to protect volatile memory. 
This attribute is associated with each virtual page 
(guarded bit in the page table entry) and with 
physical memory. The MPC601 provides a similar 
function using the "Caching Inhibited” memory 
attribute. 


Operating 
environment 
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Table H-3. Implementation-Dependent Extensions 


The MPC601 includes the following Section 2.3.3.12.1, Operating Implementation 
implementation-specific registers: "Checkstop Sources and | environment -dependent 
HIDO, HID1, HID2, HIDS5, HID15. These may or Enables Register— extensions 
may not be included in future implementations. HIDO,” through Section 

2.3.3.12.5, “Processor 

Identification Register 

(PIR)—HID15” 


Because the MPC601 automatically handles all Operating Implementation 
floating-point data types, the MPC601 floating-point environment -dependent 


assist exception defined in the PowerPC extension 
architecture (x'00E00’) would never be taken by the 
MPC601, and therefore it is not implemented. 


The MPC601 supports an additional exception Section 5.4.12, "Run Operating Implementation 
called the run mode exception in addition to the Mode/Trace Exception environment -dependent 
exceptions defined by the PowerPC architecture. (x'02000’)” extension 


The MPC601 includes a feature that supports 256- | Section 6.5.2.1, "VO Operating Implementation 
Mbyte translation capability. This is enabled when Controller Interface environment -dependent 

the T-bit is set and the BUID field is = x'07F’ inthe | Address Translation: T=1 extension 
appropriate segment register(s). in Segment Register” 





H.4 Options to the PowerPC Architecture 


Table H-4 lists options to the PowerPC architecture supported by the MPC601. Note that 
because these are optional, they may not be supported by all PowerPC processors, just as 
the MPC601 does not support other optional features supported by the architecture. 


Table H-4. Options to the PowerPC Architecture 


The MPC601 processor implements the external Section 2.3.3.9, “External | Operating Optional 
access register (EAR), which is optional to the Access Register (EAR)” environment 

PowerPC architecture. Note that only four bits (28— 

31) are implemented in the MPC601, whereas the 

PowerPC architecture defines six bits. 


The MPC601 implements the eciwx and ecowx Section 3.9, "External Optional 
instructions that are optional to the PowerPC Control Instructions” 

architecture but are required for use with the EAR 

register. 
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About This Book 


The primary objective of this user’s manual is to define the functionality of the MPC601 
microprocessor for use by software and hardware developers. The MPC6()1 processor is the 
first in the family of PowerPC™ microprocessors, and can provide a reliable foundation for 
developing products compatible with subsequent processors in the PowerPC family. 
However, the MPC601 provides a bridge between the POWER architecture and the 
PowerPC architecture, and as a result there are aspects of the MPC601 processor that are 
different from the PowerPC architecture. Therefore, a secondary objective of this manual 
is to describe how the MPC6()1 processor differs from the PowerPC architecture. 





The PowerPC architecture is comprised of the following components: 


¢ PowerPC user instruction set architecture—This includes the base user-level 
instruction set (excluding a few user-level cache-control instructions), user-level 
registers, programming model, data types, and addressing modes. 


¢ PowerPC virtual environment architecture—This describes the semantics of the 
memory model that can be assumed by software processes and includes descriptions 
of the cache model, cache-control instructions, address aliasing, and other related 
issues. Implementations that conform to the PowerPC virtual environment 
architecture also adhere to the PowerPC user instruction set architecture, but may 
not necessarily adhere to the PowerPC operating environment architecture. 


¢ PowerPC operating environment architecture—This includes the structure of the 
memory management model, supervisor-level registers, and the exception model. 
Implementations that conform to the PowerPC operating environment architecture 
also adhere to the PowerPC user instruction set architecture and the PowerPC virtual 
environment architecture. 


It is beyond the scope of the manual to provide a thorough description of the PowerPC 
architecture. It must be kept in mind that each PowerPC processor is a unique 
implementation of the PowerPC architecture. 


For readers of this manual who are concerned about compatibility issues regarding 
subsequent PowerPC processors, it is critical to read Chapter 1, “Overview,” and in 
particular Section 1.3, “MPC601 as a PowerPC Implementation,” which outlines in a very 
general manner the components of the PowerPC architecture, and indicates where and how 
the MPC6()1 diverges from the PowerPC definition. Instances where the MPC6()1 differs 
from the PowerPC architecture are noted throughout the manual. 
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Audience 


This manual is intended for system software and hardware developers and applications 
programmers who want to develop products for the MPC60)1 microprocessor and PowerPC 
processors in general. It is assumed that the reader understands operating systems, 
microprocessor system design, and the basic principles of RISC processing. 


Organization 


Following is a summary and a brief description of the major sections of this manual: 


XXxii 


Chapter 1, “Overview,” is useful for readers who want a general understanding of 
the features and functions of the PowerPC architecture and the MPC60)1 processor. 
This chapter also provides a general description of how the MPC60)1 differs from 
the PowerPC architecture. 


Chapter 2, “Registers and Data Types,” is useful for software engineers who need to 
understand the PowerPC programming model and the functionality of the registers 
implemented in the MPC601. This chapter also describes PowerPC conventions for 
storing data in memory. 


Chapter 3, “Addressing Modes and Instruction Set Summary,” provides an 
overview of the PowerPC addressing modes and a description of the instructions 
implemented by the MPC601, including the portion of the PowerPC instruction set 
and the additional instructions implemented by the MPC601. 


Specific differences between the MPC601 implementation and the PowerPC 
implementation of individual instructions are noted. 


Chapter 4, “Cache and Memory Unit Operation,” provides a discussion of cache 
timing, look-up process, MES] protocol, and interaction with other units. This 
chapter contains information that pertains both to the PowerPC virtual environment 
architecture and to the specific implementation in the MPC601. 


Chapter 5, “Exceptions,” describes the exception model defined in the PowerPC 
operating environment architecture and the specific exception model implemented 
in the MPC601. 


Chapter 6, “Memory Management Unit,” provides descriptions of operation of the 
MMU, interaction with other units, and address translation. Although this chapter 
does not provide an in-depth description of both the 64-bit and 32-bit memory 
management model defined by the PowerPC operating environment architecture, it 
does note differences between the defined 32-bit PowerPC definition and the 
MPC601 memory management implementation. 


Chapter 7, “Instruction Timing,” provides information about latencies, interblocks, 
special situations, and various conditions to help make programming more efficient. 
This chapter is of special interest to software engineers and system designers. 
Because each PowerPC implementation is unique with respect to instruction timing, 
this chapter primarily contains information specific to the MPC601. 
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¢ Chapter 8, “Signal Descriptions,” provides descriptions of individual signals of the 
MPC60I. 


¢ Chapter 9, “System Interface Operation,” describes signal timings for various - 
operations. It also provides information for interfacing to the MPC6()1. 


¢ Chapter 10, “Instruction Set,” functions as a handbook of the PowerPC instruction 
set. It provides opcodes, sorted by mnemonic, as well as a more detailed description 
of each instruction. Instruction descriptions indicate whether an instruction is part of 
the PowerPC base architecture or if it is specific to the MPC601. Each description 
indicates any differences in how the MPC6()1 implementation differs from the 
PowerPC implementation. The descriptions also indicate the privilege level of each 
instruction and which execution unit or units executes the instruction. 


¢ Appendix A, “Instruction Set Listings,” lists the superset of PowerPC and MPC6()1 
instructions. . 

¢ Appendix B, “POWER Architecture Cross Reference,” describes the relationship 
between the MPC601 and the POWER architecture. 


¢ Appendix C, “PowerPC Instructions Not Implemented in MPC601,” describes the 
set of PowerPC instructions not implemented in the MPC6()1 processor. 


¢ Appendix D, “Classes of Instructions,” describes how instructions are classified 
from the perspective of the PowerPC architecture. 


¢ Appendix E, ‘““Multiple-Precision Shifts,” describes how multiple-precision shifts 
can be programmed. 


¢ Appendix F, “Floating-Point Models,” gives examples of how the floating-point 
conversion instructions can be used to perform various conversions. 


¢ Appendix G, “Synchronization Programming Examples,” gives examples showing 
how synchronization instructions can be used to emulate various synchronization 
primitives and how to provide more complex forms of synchronization. 


¢ This manual also includes a glossary and an index. 


Suggested Reading 


This section lists additional reading that provides background to the information in this 
manual. 


¢ John L. Hennessy and David A. Patterson, Computer Architecture: A Quantitative 
Approach, Morgan Kaufmann Publishers, Inc., San Mateo, CA 


Conventions 
This document uses the following notational conventions: 


ACTIVE_HIGH Names for signals that are active high are shown in uppercase text 
without an overbar. 


ACTIVE_LOW A bar over a signal name indicates that the signal is active low—for 
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example, ARTRY (address retry) and TS (transfer start). Active-low 
signals are referred to as asserted (active) when they are low and 
negated when they are high. Signals that are not active-low, such as 
APO-—AP3 (address bus parity signals) and TTO-TT4 (transfer type 
signals) are referred to as asserted when they are high and negated 
when they are low. 


sync Courier monospaced type indicates code examples. 

mnemonics Instruction mnemonics are shown in lowercase bold. 

italics Italics indicate variable command parameters, for example, bectrx 
X’OF’ Hexadecimal numbers 

b’0011’ Binary numbers 

rAlO The contents of a specified GPR or the value 0. 

REG[FIELD] Abbreviations or acronyms for registers are shown in uppercase 


text. Specific bit fields or ranges are shown in brackets. 


Xx In certain contexts, such as a signal encoding, this indicates a don’t 
care. For example, if TTO-TT3 are binary encoded b’x001’, the 
state of TTO is a don’t care. 


Acronyms and Abbreviations 


The Table i contains acronyms and abbreviations that are used in this document: 


Table i. Acronyms and Abbreviated Terms 


ee [tee me | 



















Branch processing unit 


PU 
DAE/source instruction service 
register 
AR Cache address register External access register 


EA 
Complementary metal-oxide ECC Error checking and correction 
semiconductor 
| Common on-chip processor FPECR Floating-point exception cause 
register 


COP 
CTR 


Count register FPSCR Floating-point status and control 
register 
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Table i. Acronyms and Abbreviated Terms (Continued) 
es 
FP Floating-point unit POWER Performance Optimized with 
Enhanced RISC 
PR | Privilege level bit 
P 
P 





















[ABR Instruction address breakpoint 
register 








SPRGn General SPR 


SR Segment register 


Machine status save/restore 
register 0 


SRR1 Machine status save/restore 
register 1 


Time base register 
Translation lookaside buffer. 


Transistor-to-transistor logic 


R 


Memory descriptor register 


Modified/exclusive/shared/invalid— 
cache coherency protocol 


U = , 
— 
feat [hareaonear———~=S« PES =| Pape loentyomus 
[soca | 
Electronics Engineers 
es oe 
buffer 
U 
B ‘ 


4/2] o 
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-|o|™ a 
fo) 











MMU Memory management unit 


MMU 
Most-significant byte 
SR 


fmsb Most-significant bit 












Machine state register | 
eteninbe | 
[Neopeaton | 


a Processor identification tag 


S 


LB Unified translation look-aside buffer 







WIM Write-through/cache- 
inhibited/memory-coherency 


enforced bits 












XER Integer exception register 


Processor identification register 
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Differences between IBM and Motorola Terminology 


Table ii describes terminology conventions used in this manual, noting in particular the 
differences between IBM and Motorola usages. 


Table ii. Differences between IBM and Motorola Terminology 


ee ee 
Ce 


Programmable I/O (PIO) VO controller interface 
operation 
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Chapter 1 
Overview 


This chapter provides an overview of the MPC601 features, including a block diagram 
showing the major functional components. It also provides an overview of the PowerPC 
architecture and hardware design conventions adapted for current and forthcoming 
PowerPC processors, and information about how the MPC601 implementation differs or 
augments these architectural and hardware definitions. 


1.1 MPC601 Overview 


This section describes the features of the MPC601, provides a block diagram showing the 
major functional units, and gives an overview of how the MPC6()1 operates. 


The MPC601 is the first implementation of the PowerPC™ family of reduced instruction 
set computer (RISC) microprocessors. The MPC601 is a 32-bit implementation of the 
64-bit PowerPC architecture. It is a superscalar processor capable of issuing and retiring 
three instructions per clock, one to each of three execution units. Instructions can complete 
out of order for increased performance; however, the MPC6()1 makes execution appear 
sequential. 


The MPC601 integrates three execution units—an integer unit (IU), a branch processing 
unit (BPU), and a floating-point unit (FPU). The ability to execute three instructions in 
parallel and the use of simple instructions with rapid execution times yield high efficiency 
and throughput for MPC601-based systems. Most integer instructions execute in one clock 
cycle. The FPU is pipelined so a single-precision multiply-add instruction can be issued 
every clock cycle. 


The MPC601 includes an on-chip, 32-Kbyte, eight-way set-associative, physically 
addressed, unified instruction and data cache and an on-chip memory management unit 
(MMU). The MMU contains a 256-entry, two-way set-associative, unified translation look- 
aside buffer (UTLB) and provides support for demand paged virtual memory address 
translation and variable-sized block translation. Both the UTLB and the cache use least 
recently used (LRU) replacement algorithms. 


The MPC601 has a high-bandwidth, 64-bit data bus and a 32-bit address bus. The MPC6()1 
interface protocol allows multiple masters to compete for system resources through a 
central external arbiter. Additionally, on-chip snooping logic maintains cache coherency in 
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multiprocessor applications. The MPC601 supports single-beat and burst data transfers for 
memory accesses; it also supports both memory-mapped I/O and I/O controller interface 
addressing. 


The MPC601 uses an advanced, 3.6-V CMOS process technology and maintains full 
interface compatibility with TTL devices. 


1.1.1 MPC601 Features 


This section describes features specific to the MPC6(0)1. Note that general characteristics of 
the PowerPC architecture and hardware conventions among the family of PowerPC 
processors are listed in Section 1.3.1, “Features.” Major features of the MPC601 are as 
follows: 


¢ High-performance, superscalar microprocessor 


— As many as three instructions in execution per clock (one to each of the three 
execution units) 


— Single clock cycle execution for most instructions 

— Pipelined FPU for all single-precision and most double-precision operations 
¢ Three independent execution units and two register files 

— BPU featuring static branch prediction 

— A32-bit IU 

— Fully IEEE 754-compliant FPU for both single- and double-precision operations 

— Thirty-two GPRs for integer operands 

— Thirty-two FPRs for single- or double-precision operands 
¢ High instruction and data throughput 

— Condition register (CR) look-ahead operations performed by BPU 

— Zero-cycle branch capability 

— Programmable static branch prediction on unresolved conditional branches 


— Instruction unit capable of prefetching eight instructions per clock from the 
cache 


— Aprefetch queue that can hold as many as eight instructions that provides look- 
ahead capability 


— Interlocked pipelines with feed- -forwarding that control data dependencies in 
hardware 


— Unified 32-Kbyte cache—eight-way set-associative, physically addressed; LRU 
replacement algorithm 


— Cache write-back or write-through open programmable on a per page or per 
block basis 


— Memory unit with a two-element read queue and a three-element write queue 
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— Run-time reordering of loads and stores 
— BPU that performs condition register (CR) look-ahead operations 
— Programmable static branch prediction on unresolved conditional branches 


— Address translation facilities for 4-Kbyte page size, variable block size, and 
256-Mbyte segment size 


— A 256-entry, two-way set-associative UTLB 
— Four-entry, first-level ITLB 
— Hardware table search (caused by UTLB misses) through hashed page tables 
— 52-bit virtual address; 32-bit physical address 
— Four-entry BTLB providing 128-Kbyte to 8-Mbyte blocks 
¢ Facilities for enhanced system performance 
— Bus speed defined as selectable division of operating frequency 
— A 64-bit split-transaction external data bus with burst transfers 
— Support for address pipelining and limited out-of-order bus transactions 
— Snooped copyback queues for cache block (sector) copyback operations 
— Bus extensions for I/O controller interface operations 
— Multiprocessing support features that include the following: 
— Hardware enforced, four-state cache coherency protocol (MESI) 
— Separate port into cache tags for bus snooping 


1.1.2 Block Diagram 


Figure 1-1 provides a block diagram of the MPC601 that illustrates how the execution 
units—IU, FPU, and BPU—operate independently and in parallel. 


The MPC601's 32-Kbyte, unified cache tag directory has a port dedicated to snooping bus 
transactions, preventing interference with processor access to the cache. The MPC6()1 also 
provides address translation and protection facilities, including a UTLB and a BTLB, and 
a four-entry ITLB that contains the four most recently used instruction address translations 
for fast access by the instruction unit. 


Instruction prefetching and issuing is handled in the instruction unit. Translation of 
addresses for cache or external memory accesses are handled by the memory management 
unit. Both units are discussed in more detail in Sections 1.1.3, “Instruction Unit,” and 1.1.5, 
“Memory Management Unit (MMU).” 
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INSTRUCTION UNIT 
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MEMORY UNIT 
READ WRITE QUEUE 
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32-BIT ADDRESS BUS 


Figure 1-1. MPC601 Block Diagram 
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1.1.3 Instruction Unit 


As shown in Figure 1-1, the MPC601 instruction unit, which contains an instruction queue 
and the BPU, provides centralized control of instruction flow to the execution units. The 
instruction unit determines the address of the next instruction to be fetched based on 
information from a sequential fetcher and the BPU. The instruction unit also enforces 
pipeline interlocks and controls feed-forwarding. 


The sequential fetcher is a dedicated adder that computes the address of the next sequential 
instruction based on the address of the last fetch and the number of words accepted into the 
queue. The BPU searches the bottom half of the instruction queue for a branch instruction 
and uses static branch prediction on unresolved conditional branches to allow the 
instruction fetch unit to prefetch instructions from a predicted target instruction stream 
while a conditional branch is evaluated. The BPU also folds out branch instructions for 
unconditional branches. 


Instructions issued beyond a predicted branch do not complete execution until the branch 
is resolved, preserving the programming model of sequential execution. If any of these 
instructions are to be executed in the BPU, they are decoded but not issued to the BPU. If 
any of these are FPU and IU instructions, they are issued and allowed to complete up to the 
register write-back stage, but no write back is performed. 


When a correctly predicted branch is resolved, instruction execution continues without 
interruption along the predicted path. If branch prediction is incorrect, the instruction 
fetcher flushes all instructions from the instruction queue. Instruction issue then resumes 
with the instruction from the correct path. Instructions are never issued to the IU or FPU 
unless they must be executed by the program. 


1.1.3.1 Instruction Queue 


The instruction queue, shown in Figure 1-2, contains instructions prefetched from the 
current instruction stream. 


The instruction unit prefetches instructions from the cache into the instruction queue. As 
many as eight instructions (a cache sector) can be loaded into the instruction queue during 
any cycle. Instructions move from the top of the queue (Q7) towards the bottom of the 
queue (QO) and a full range of shift amounts through the queue is supported. 


The upper half of the instruction queue (Q4-Q7) provides buffering to reduce the need to 
access the cache. Some initial decoding of instructions is performed in the lower half (QO 
through Q3) of the queue. QO functions as the initial decode stage for the IU. 


As instructions issue to the BPU and FPU, new instructions are loaded into the queue. 
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FROM CACHE 





TO BPU 
TO FPU 


TOU 


Figure 1-2. Instruction Queue 


1.1.4 Independent Execution Units 


One benefit of the PowerPC architecture is its support for independent floating-point, 
integer, and branch processing execution units, making it possible to implement advanced 
features such as look-ahead operations and out-of-order instruction dispatches. For 
example, since branch instructions do not depend on GPRs or FPRs, branches can often be 
resolved early, eliminating stalls caused by taken branches. Additionally, upon resolution 
of the branch, the branch instruction is removed from the pipeline and fetching continues 
from the first instruction in the target stream. This procedure is called branch folding. 


The following sections describe the MPC60)1’s three execution units—the BPU, IU, and 
FPU. 


1.1.4.1. Branch Processing Unit (BPU) 


The BPU performs condition register (CR) look-ahead operations on conditional branches. 
The BPU looks through the bottom half of the instruction queue for a conditional branch 
instruction and attempts to resolve it early, achieving the effect of a zero-cycle branch in 
many cases. 


The BPU uses a bit in the instruction encoding to predict the direction of the conditional 
branch. Therefore when an unresolved conditional branch instruction is encountered, the 
MPC6()1 prefetches instructions from the predicted target stream until the conditional 
branch is resolved. 


The BPU contains an adder to compute branch target addresses and three special-purpose, 
user-control registers—the LR, the CTR, and the CR. The BPU calculates the return pointer 
for subroutine calls and saves it into the LR. The LR also contains the branch target address 
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for the Branch Conditional to Link Register (belrx) instruction. The CTR contains the 
branch target address for the Branch Conditional to Count Register (bectrx) instruction. 
The contents of the LR and CTR can be copied to or from any GPR. Because the BPU uses 
dedicated registers rather than general-purpose or floating-point registers, execution of 
branch instructions is independent from execution of integer and floating-point 
instructions. 


1.1.4.2 Integer Unit (IU) 


The IU executes all integer and memory access instructions (including those required for 
floating-point registers). The IU contains an arithmetic logic unit (ALU), a multiplier, a 
divider, the integer exception register (XER), and the general-purpose register file. One 
instruction can be issued to the IU each clock cycle. 


The IU interfaces with the cache and MMU for all instructions that access memory. 
Addresses are formed by adding the source 1 register operand specified by the instruction 
(or zero) to either a source 2 register operand or to a 16-bit, immediate value embedded in 
the instruction. 


Load and store instructions are issued and translated in program order; however, the 
accesses can occur out of order. These accesses can be strictly ordered through the use of 
synchronizing instructions. 


Load and store instructions are considered to have completed execution after the address is 
translated. If the address for a load or store instruction hits in the UTLB or BTLB and it is 
aligned, the instruction execution takes one clock cycle, allowing back-to-back issue of 
load and store instructions. 


1.1.4.3 Floating-Point Unit (FPU) 


The FPU contains a single-precision multiply-add array, a divider, the floating-point status 
and control register (FPSCR), and the FPRs. The multiply-add array allows the MPC601 to 
efficiently implement floating-point operations such as multiply, add, and multiply-add. 
The FPU is pipelined so that most single-precision instructions and many double-precision 
instructions can be issued back-to-back. The FPU contains two additional instruction 
queues. These queues allow floating-point instructions to be issued from the instruction 
queue even if the FPU is busy, making instructions available for issue to the other execution 
units. 


Like the BPU, the FPU can access instructions from the bottom half of the instruction queue 
(Q3-Q0), which permits floating-point instructions that do not depend on unexecuted 
instructions to be issued early to the FPU, thus maximizing efficiency and reducing 
bottlenecks in the instruction pipeline. 


All IEEE 754 floating-point data types (normalized, denormalized, NaN, zero, and infinity) 
are supported in hardware on the MPC6(0)1, which eliminates the latency incurred by 
software exception routines to support all data types. 
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1.1.5 Memory Management Unit (MMU) 


The MPC601’s MMU supports up to 4 Terabytes (252) of virtual memory and 4 Gigabytes 
(232) of physical memory. The MMU also controls access privileges for these spaces on 
block and page granularities. Referenced and changed status are maintained by the 
processor for each page to assist implementation of a demand-paged virtual memory 
system. 


The instruction unit generates all instruction addresses; these addresses are both for 
sequential instruction prefetches and addresses that correspond to a change of program 
flow. The integer unit generates addresses for data accesses (both for memory and the I/O 
controller interface). 


After an address is generated, the upper order bits of the logical address are translated by 
the MMU into physical address bits. Simultaneously, the lower order address bits (that are 
untranslated and therefore considered both logical and physical), are directed to the on-chip 
cache where they form the index into the eight-way set-associative tag array. After 
translating the address, the MMU passes the higher-order bits of the physical address to the 
cache, and the cache lookup completes. For cache-inhibited accesses or accesses that miss 
in the cache, the untranslated lower order address bits are concatenated with the translated 
higher-order address bits; the resulting 32-bit physical address is then used by the memory 
unit and the system interface, which accesses external memory. 


The MMU also directs the address translation and enforces the protection hierarchy 
programmed by the operating system in relation to the supervisor/user privilege level of the 
access and in relation to whether the access is a load or store. 


For instruction accesses, the MMU first performs a lookup in the four entries of the ITLB 
for the physical address translation. Instruction accesses that miss in the ITLB and all data 
accesses cause a lookup in the UTLB and BTLB for the physical address translation. In 
most cases, the physical address translation resides in one of the TLBs and the physical 
address bits are readily available to the on-chip cache. In the case where the physical 
address translation misses in the TLBs, the MPC6()1 automatically performs a search of the 
translation tables in memory using the information in the SDR1 and the corresponding 
segment register. 


Memory management in the MPC601 is described in more detail in Section 1.3.6.2, 
“MPC601 Memory Management.” 


1.1.6 Cache Unit 


The MPC601 contains a 32-Kbyte, eight-way set associative, unified (instruction and data) 
cache. The cache line size is 64 bytes, divided into two eight-word sectors, each of which 
can be snooped, loaded, cast-out, or invalidated independently. The cache is designed to 
adhere to a write-back policy, but the MPC601 allows control of cacheability, write policy, 
and memory coherency at the page and block level. The cache uses a least recently used 
(LRU) replacement policy. 
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The cache provides an eight-word interface to the rest of the device. The surrounding logic 
selects, organizes, and forwards the requested information to the requesting unit. Write 
operations to the cache can be performed on a byte basis, and a complete read-modify-write 
operation to the cache can occur in each cycle. 


The instruction unit provides the cache with the address of the next instruction to be 
prefetched. In the case of a cache hit, the cache returns the instruction and as many of the 
instructions following it as can be placed in the eight-word instruction queue up to the 
cache sector boundary. If the queue is empty, as many as eight words (an entire sector) can 
be loaded into the queue in parallel. 


The cache has one address port dedicated to instruction fetch and load/store accesses and 
one dedicated to snooping transactions on the system interface. Therefore, snooping does 
not require additional clock cycles unless a snoop hit that requires a cache status update 
occurs. 


1.1.7 Memory Unit 


The MPC601’s memory unit contains read and write queues that buffer operations between 
the external interface and the cache. These operations are comprised of operations resulting 
from load and store instructions that are cache misses and read and write operations 
required to maintain cache coherency, and table search operations. As shown in Figure 1-3, 
the read queue contains two elements and the write queue contains three elements. Each 
element of the write queue can contain as many as eight words (one sector) of data. One 
element of the write queue, marked snoop in Figure 1-3, is dedicated to writing cache 
sectors to system memory after a modified sector is hit by a snoop from another processor 
or snooping device on the system bus. The use of this queue guarantees a high priority 
operation that ensures a deterministic response time when snooping hits a modified sector. 












DATA 
(from cache) 


ADDRESS 
(from cache) 









READ WRITE QUEUE 


ADDRESS 


Figure 1-3. Memory Unit 


The other two elements in the write queue are used for store operations and writing back 
modified sectors that have been deallocated by updating the queue; that is, when a cache 
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location is full, the least-recently used cache sector is deallocated by first being copied into 
the write queue and from there to system memory. Note that snooping can occur after a 
sector has been pushed out into the write queue and before the data has been written to 
system memory. Therefore, to maintain a coherent memory, the write queue elements are 
compared to snooped addresses in the same way as the cache tags. If a snoop hits a write 
queue element, the data is first stored in system memory before it can be loaded into the 
cache of the snooping bus master. Full coherency checking between the cache and the write 
queue prevents dependency conflicts. 


Execution of a load or store instruction is considered complete when the associated address 
translation completes, guaranteeing that the instruction has completed to the point where it 
is known that it will not generate an internal exception. However, after address translation 
is complete, a read or write operation can still generate an external exception. 


Load and store instructions are always issued and translated in program order with respect 
to other load and store instructions. However, a load or store operation that hits in the cache 
can complete ahead of those that miss in the cache. The MPC601 ensures memory 
consistency by comparing target addresses and prohibiting instructions from completing 
out of order if an address matches. Load and store operations can be forced to execute in 
strict program order by using the synchronization instructions. 


1.1.8 System interface 


Because the cache on the MPC601 is an on-chip, write-back primary cache, the 
predominant type of transaction for most applications is burst-read memory operations, 
followed by burst-write memory operations, I/O controller interface operations, and single- 
beat (noncacheable or write-through) memory read and write operations. Additionally, 
there can be address-only operations, variants of the burst and single-beat operations 
(global memory operations that are snooped, and atomic memory operations, for example), 
and address retry activity (for example, when a snooped read access hits a modified line in 
the cache). 


Memory accesses can occur in single-beat and four-beat burst data transfers. The address 
and data buses are independent for memory accesses to support pipelining and split 
transactions. The MPC601 can pipeline as many as two transactions and has limited support 
for out-of-order split-bus transactions. 


Memory is accessed through an arbitration mechanism that allows devices to compete for 
bus mastership. This arbitration mechanism is flexible, allowing the MPC601 to be 
integrated into systems that implement various fairness and bus-parking procedures to 
avoid arbitration overhead. Additional multiprocessor support is provided through 
coherency mechanisms that provide snooping, external control of the on-chip cache and 
TLB, and support for a secondary cache. Multiprocessor software support is provided 
through the use of atomic memory operations. 
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Typically, memory accesses are weakly ordered—sequences of operations, including 
load/store string and multiple instructions, do not necessarily complete in the order they 
begin—maximizing the efficiency of the bus without sacrificing coherency of the data. The 
MPC601 allows read operations to precede store operations (except when a dependency 
exists, of course). In addition, the MPC601 may reorder high priority store operations 
ahead of lower priority store operations. Because the processor can dynamically optimize 
run-time ordering of load/store traffic, overall performance is improved. 


1.2 Levels of the PowerPC Architecture 


The PowerPC architecture consists of the following layers, and adherence to the PowerPC 
architecture can be measured in terms of which of the following levels of the architecture 
is implemented: 


¢ PowerPC user instruction set architecture—This definition includes the base user- 
level instruction set (excluding a few user-level memory-control instructions), user- 
level registers, programming model, data types, and addressing modes. 


Aspects of the PowerPC user instruction set architecture are discussed in Chapter 2, 
“Registers and Data Types,” Chapter 3, “Addressing Modes and Instruction Set 
Summary,” and Chapter 10, “Instruction Set.” 


* PowerPC virtual environment architecture—PowerPC virtual environment 
architecture—This describes the semantics of the memory model that can be 
assumed by software processes and includes descriptions of the cache model, cache- 
control instructions, address aliasing, and other related issues. Implementations that 
conform to the PowerPC virtual environment architecture also adhere to the 
PowerPC user instruction set architecture, but may not necessarily adhere to the 
PowerPC operating environment architecture. 


Aspects of the PowerPC virtual environment architecture are discussed in 
Chapter 2, “Registers and Data Types,” Chapter 3, “Addressing Modes and 
Instruction Set Summary,” Chapter 5, “Exceptions,” and Chapter 10, “Instruction 
Set.” 


¢ PowerPC operating environment architecture—This includes the structure of the 
memory management model, supervisor-level registers, and the exception model. 
Implementations that conform to the PowerPC operating environment architecture 
also adhere to the PowerPC user instruction set architecture and the PowerPC virtual 
environment architecture definition. 


Aspects of the PowerPC operating environment architecture are discussed in 
Chapter 2, “Registers and Data Types,” Chapter 3, “Addressing Modes and 
Instruction Set Summary,” Chapter 5, “Exceptions,”’ Chapter 6, “Memory 
Management Unit,” and Chapter 10, “Instruction Set.” 


Note that while the MPC601 is said to adhere to the PowerPC architecture at all three 
levels, it diverges in aspects of its implementation to a greater extent than can be expected 
of subsequent PowerPC processors. Many of the differences result from the fact that the 
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MPC601 design is pivotal, providing compatibility with an existing architecture standard 
(POWER), while providing a reliable platform for hardware and software development 
compatible with subsequent PowerPC processors. 


The PowerPC architecture allows a wide range of designs for such features as cache and 
system interface implementations. 


1.3 MPC601 as a PowerPC Implementation 


The PowerPC architecture is derived from the IBM Performance Optimized with Enhanced 
RISC (POWER) architecture. The PowerPC architecture shares the benefits of the POWER 
architecture optimized for single-chip implementations. The architecture design facilitates 
parallel instruction execution and is scalable to take advantage of future technological 
gains. For compatibility, the MPC601 also implements instructions from the POWER user 
programming model that are not part of the PowerPC definition. 


This section describes the PowerPC architecture in general, noting where the. MPC601 
differs. The organization of this section follows the sequence of the chapters in this manual 
as follows: 


¢ Features—This section describes general features that the MPC601 shares with the 
PowerPC family of microprocessors. It does not list PowerPC features not 
implemented in the MPC601. 


¢ Registers and programming model—This section describes the architected registers 
for the operating environment architecture common among PowerPC processors 
and describes the programming model. It also describes differences in how the 
architected registers are used in the MPC601 and describes the additional registers 
that are unique to the MPC601: 


¢ Instruction set and addressing modes—This section describes the PowerPC 
instruction set and addressing modes for the PowerPC operating environment 
architecture, and it generally defines the subset of the instruction set implemented in 
the MPC601 as well as additional instructions implemented in the MPC601 but not 
defined in the PowerPC architecture. 


¢ Cache implementation—This section describes the cache model that is defined 
generally for PowerPC processors by the virtual environment architecture. It also 
provides specific details about the MPC601 cache implementation. 


¢ Exception model—This section describes the exception model of the PowerPC 
operating environment architecture and the differences in the MPC601 exception 
model. 


¢ Memory management—This section describes generally the conventions for 
memory management among the PowerPC processors. Note that the PowerPC 
operating environment architecture defines different memory management designs 
for 64- and 32-bit implementations. This section also describes the general 
differences between the MPC601 and the 32-bit PowerPC memory management 
specification. 
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¢ Instruction timing—This section provides a general description of the instruction 
timing provided by the superscalar, parallel execution supported by the PowerPC 
architecture. 

¢ System interface—This section describes the signals implemented on the MPC6()1. 


1.3.1 Features 
The MPC601 incorporates the following features of the PowerPC architecture: 
¢ High-performance, superscalar microprocessor implementations 


The PowerPC architecture allows optimizing compilers to schedule instructions to 
maximize performance through efficient use of the PowerPC instruction set and 
register model. The multiple, independent execution units allow compilers to 
maximize parallelism and instruction throughput. Compilers that take advantage of 
the flexibility of the PowerPC architecture can additionally optimize system 
performance of the PowerPC processors. 


The PowerPC architecture supports the following: 

— Multiple, independent execution units 

— Single clock cycle execution for most instructions 

— Fully IEEE 754-compliant FPU for both single- and double-precision operations 
— Thirty-two general-purpose registers (GPRs) for integer operands 


— Thirty-two floating-point registers (FPRs) for single- or double-precision 
operands 


¢ High instruction and data throughput 


— Cache write-back or write-through operation programmable on a per-page or 
per-block basis 


— Run-time reordering of loads and stores 
* Facilities for enhanced system performance 
— Programmable big- and little-endian byte ordering 
— Interprocessor UTLB invalidation 
— Interprocessor cache control operations 
— Atomic memory references 
¢ In-system testability and debugging features through boundary-scan capability 


Specific features of the MPC6()1 are listed in Section 1.1.1, “MPC601 Features.” 


1.3.2 Registers and Programming Model 


The following subsections describe the general features of the PowerPC registers and 
programming model and of the specific MPC601 implementation, respectively. 
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1.3.2.1 PowerPC Registers and Programming Model 


The PowerPC architecture defines register-to-register operations for most computational 
instructions. Source operands for these instructions are accessed from the on-chip registers 
or are provided as immediate values embedded in the instruction opcode. The three-register 
instruction format allows specification of a different target register from the two source 
registers. Data is transferred between memory and registers with explicit load and store 
instructions only. 


PowerPC processors have two levels of privilege—supervisor mode of operation (typically 
used by the operating environment) and one that corresponds to the user mode of operation 
(used by the application software). The programming models incorporate 32 GPRs, 32 
FPRs, special-purpose registers (SPRs), and several miscellaneous registers. Note that 
there are several registers that are part of the PowerPC architecture that are not 
implemented in the MPC601; for example, the time base registers are not implemented in 
the MPC601 and the address space register (ASR) is implemented only in 64-bit 
implementations. Likewise, each PowerPC implementation has its own unique set of 
hardware implementation (HID) registers, which are implementation-specific. 


This division allows the operating system to control the application environment 
(providing virtual memory and protecting operating-system and critical machine 
resources). Instructions that control the state of the processor, the address translation 
mechanism, and supervisor registers can be executed only when the processor is operating 
in supervisor mode. 


The following sections summarize the PowerPC registers that are implemented in the 
MPC601 processor. Chapter 2, “Register Models and Data Types,” provides detailed 
information about the registers implemented in the MPC601. 


1.3.2.1.1 General-Purpose Registers (GPRs) 

The PowerPC architecture defines 32 user-level, general-purpose registers (GPRs). These 
registers are either 32 or 64 bits wide depending on the implementation. The GPRs serve 
as the data source or destination for all integer instructions and provide addresses for all 
memory-access instructions. 


1.3.2.1.2 Floating-Point Registers (FPRs) 

The PowerPC architecture also defines 32 user-level 64-bit floating-point registers (FPRs) 
for both 32- and 64-bit PowerPC implementations. The FPRs serve as the data source or 
destination for floating-point instructions. These registers can contain data objects of either 


single- or double-precision floating-point formats. The floating-point register file can only 
be accessed by the FPU. 


1.3.2.1.3 Condition Register (CR) 


The CR is a 32-bit user-level register that consists of eight, four-bit fields that reflect the 
results of certain operations, such as move, integer and floating-point compare, arithmetic, 
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and logical instructions, and provide a mechanism for testing and branching. The CR is 32 
bits wide in all implementations. 


1.3.2.1.4 Floating-Point Status and Control Register (FPSCR) 


The floating-point status and control register (FPSCR) is a user-level register that contains 
all exception signal bits, exception summary bits, exception enable bits, and rounding 
control bits needed for compliance with the IEEE 754 standard. The FPSCR is 32 bits wide 
in all implementations. 


1.3.2.1.5 Machine State Register (MSR) 

The machine state register (MSR) is a supervisor-level register that defines the state of the 
processor. The contents of this register is saved when an exception is taken and restored 
when the exception handling completes. The MPC60)1 implements the MSR as a 32-bit 
register; other PowerPC processors implement it as a 64-bit register. 


1.3.2.1.6 Segment Registers (SRs) 

The sixteen 32-bit segment registers (SRs) are present only in 32-bit PowerPC 
implementations. Figure 2-12 shows the format of a segment register when the T bit is 
cleared and Figure 2-13 shows the layout when the T bit is set. The fields in the segment 
register are interpreted differently depending on the value of bit 0). Note that 64-bit 
PowerPC implementations use a segment table rather than the segment registers for 
segment information. 


1.3.2.1.7 Special-Purpose Registers (SPRs) 

The PowerPC operating environment architecture defines numerous special-purpose 
registers that serve a variety of functions, such as providing controls, indicating status, 
configuring the processor, and performing special operations. Some SPRs are accessed 
implicitly as part of executing certain instructions. All SPRs can be accessed by using the 
Move to/from Special Purpose Register instructions, mtspr and mfspr. 


In the MPC601, all SPRs are 32 bits wide. 


1.3.2.1.8 User-Level SPRs 
The following MPC601 SPRs are accessible by user-level software: 


¢ Link register (LR)—The link register can be used to provide the branch target 
address and to hold the return address after branch and link instructions. The LR is 
64 bits wide in 64-bit implementations and 32 bits wide in 32-bit implementations. 


¢ Count register (CTR)—The CTR is decremented and tested automatically as a result 
of branch-and-count instructions. The CTR is 64 bits wide in 64-bit 
implementations and 32 bits wide in 32-bit implementations. 


¢ The integer exception register (XER) contains the integer carry and overflow bits 
and two fields for the Load String and Compare Byte Indexed (Iscbx) instruction. 
The XER is 32 bits wide in all implementations. 
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Note that while these registers are defined as SPRs and can be accessed by using the mtspr 
and mfspr instructions, these registers are typically accessed implicitly. In addition, the 
PowerPC architecture defines a 64-bit time base register (TB), which replaces the real-time 
clock implementation on the MPC601. . 


1.3.2.1.9 Supervisor-Level SPRs 
The MPC6()1 also contains SPRs that can be accessed only by supervisor-level software. 
These registers consist of the following: 


1-16 


The 32-bit data access exception (DAE)/source instruction service register (DSISR) 
defines the cause of data access and alignment exceptions. 


The data address register (DAR) is a 32-bit register that holds the address of an 
access after an alignment or data access exception. 


Decrementer register (DEC) is a 32-bit decrementing counter that provides a 
mechanism for causing a decrementer exception after a programmable delay. 
PowerPC architecture defines that the DEC frequency be provided as a subdivision 
of the processor clock frequency; however, the MPC60)1 implements a separate RTC 
which also serves the DEC. 


The 32-bit table search description register 1(SDR1) specifies the page table format 
used in logical-to-physical address translation for pages. 


The machine status save/restore register () (SRRO) is a 32-bit register that is used by 
the MPC601 for saving the address of the instruction that caused the exception, and 
the address to return to when a Return from Interrupt (rfi) instruction is executed. 


The machine status save/restore register 1 (SRR1) is a 32-bit register used to save 
machine status on exceptions and to restore machine status when an rfi instruction 
is executed. 


General SPRs, SPRGO-SPRG3, are 32-bit registers provided for operating system 
use. 


The external access register (EAR) is a 32-bit register that controls access to the 
external control facility through the External Control Input Word Indexed (eciwx) 
and External Control Output Word Indexed (ecowx) instructions. 


The processor version register (PVR) is a 32-bit, read-only register that identifies the 
version (model) and revision level of the PowerPC processor. 


Block address translation (BAT) registers—The PowerPC architecture defines 16 
BAT registers, divided into four pairs of data BATs (DBATs) and four pairs of 
instruction bats (IBATs). The MPC601 includes four pairs of unified BATs 
(BATOU-—BAT3U and BATOL-BAT3L). See Figure 2-1 for a list of the SPR 
numbers for the BAT registers. Figure 2-23 and Figure 2-24 show the layout of the 
upper and lower BAT registers. Note that the format for the MPC601’s 
implementation of the BAT registers differs from the PowerPC architecture 
definition. | 
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In addition, 64-bit PowerPC processors implement a supervisor-level, 64-bit address space 
register (ASR) that defines the physical address of the segment tables in memory. 


1.3.2.2 MPC601 Programming Model and Additional Registers 
The MPC601 includes the following registers that are not part of the PowerPC architecture. 


¢ Real-time clock (RTC) registers—RTCU and RTCL (RTC upper and RTC lower). 
The RTCU register maintains the number of seconds from a time specified by 
software. The RTCL register maintains a fraction of the current second in 
nanoseconds. The contents of either register can be copied to any GPR. These 
registers are specific to the MPC601. These registers are not supported in the 
PowerPC architecture, which uses the time base facility rather than a separate real- 
time clock. For more information, see Section 2.2.5.3, “Real-Time Clock (RTC) 
Registers.” These registers are also implemented in the POWER architecture. 
PowerPC processors implement a time base based on the processor clock. 


* MQregister (MQ). The MQ register is a MPC601-specific, 32-bit register used as a 
register extension to accommodate the product for the multiply instructions and the 
dividend for the divide instructions. It is also used as an operand of long rotate and 
shift instructions. This register is provided for compatibility with POWER 
architecture, and is not part of the PowerPC architecture. For more information, see 
Section 2.2.5.1, “MQ Register (MQ).” The MQ register is typically accessed 
implicitly as part of executing a computational instruction. This register is also 
implemented in the POWER architecture. 


¢ Block-address translation (BAT) registers. The MPC601 includes eight block- 
address translation registers (BATs), consisting of four pairs of BATs (BATOU-— 
BAT3U and BATOL-BAT3L). See Figure 2-1 for a list of the SPR numbers for the 
BAT registers. Figure 2-23 and Figure 2-24 show the formats of the upper and lower 
BAT registers. Note that other PowerPC implementations have two sets of four BAT 
pairs—four sets of upper and lower IBAT's (which occupy the space of the unified 
BATs in the MPC601) and four sets of upper and lower DBATs (located in the 
subsequent eight positions at SPR numbers 536-543). The PowerPC architecture 
defines twice as many BAT registers—four IBAT pairs and four DBAT pairs. 


¢ The hardware implementation registers, HIDO-HID2, HIDS, and HID15 are 
provided primarily for debugging. For more information, see Section 2.3.3.12.1, 
“Checkstop Sources and Enables Register—HIDO” through Section 2.3.3.12.5, 
“Processor Identification Register (PIR)—HID15.” HID15 holds the four-bit 
processor identification tag (PID) that is useful for differentiating processors in 
multiprocessor system designs. For more information, see Section 2.3.3.12.5, 
“Processor Identification Register (PIR)—HID15.” Note that while it is not 
guaranteed that the implementation of HID registers is consistent anong PowerPC 
processors, other processors may be designed with similar or identical HID 
registers. 
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1.3.3 Instruction Set and Addressing Modes 


The following subsections describe the PowerPC instruction set and addressing modes in 
general. Differences in the MPC601’s instruction set are described in Section 1.3.3.2, 
“MPC601 Instruction Set.” 


1.3.3.1 PowerPC Instruction Set and Addressing Modes 


All PowerPC instructions are encoded as single-word (32-bit) opcodes. Instruction formats 
are consistent among all instruction types, permitting efficient decoding to occur in parallel 
with operand accesses. This fixed instruction length and consistent format greatly 
simplifies instruction pipelining. In addition, each instruction is defined in a way that 
simplifies pipelined implementations and allows maximum realization of instruction-level 
parallelism. 


1.3.3.1.1 PowerPC Instruction Set 
The PowerPC instructions are divided into the following categories: 

¢ Integer instructions—These include computational and logical instructions. 
— Integer arithmetic instructions 
— Integer compare instructions 
— Integer logical instructions 
— Integer rotate and shift instructions 

¢ Floating-point instructions—These include floating-point computational 
instructions, as well as instructions that affect the floating-point status and control 
register (FPSCR). 
— Floating-point arithmetic instructions 
— Floating-point multiply/add instructions 
— Floating-point rounding and conversion instructions 
— Floating-point compare instructions 
— Floating-point status and control instructions 

¢ Load/store instructions—These include integer and floating-point load and store 
instructions. 
— Integer load and store instructions 
— Integer load and store multiple instructions 
— Floating-point load and store 
— Floating-point move instructions 

¢ Flow control instructions—These include branching instructions, condition register 
logical instructions, trap instructions, and other instructions that affect the 
instruction flow. 
— Branch and trap instructions 
— Condition register logical instructions 
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¢ Processor control instructions—These instructions are used for synchronizing 
memory accesses and management of caches, UTLBs, and the segment registers. 


— Move to/from special purpose register instructions 


¢ Memory control instructions—These instructions provide control of caches, TLBs, 
and segment registers. 


— Supervisor-level cache management instructions 

— User-level cache instructions 

— Segment register manipulation instructions 

— Translation look-aside buffer management instructions 


Note that this grouping of the instructions does not indicate which execution unit executes 
a particular instruction or group of instructions. This information, which is useful in taking 
full advantage of superscalar parallel instruction execution, is provided in Chapter 7, 
“Instruction Timing,” and Chapter 10, “Instruction Set.” 


Integer instructions operate on byte, half-word, and word operands. Floating-point 
instructions operate on single-precision and double-precision floating-point operands. The 
PowerPC architecture uses instructions that are four bytes long and word-aligned. It 
provides for byte, half-word, and word operand loads and stores between memory and a set 
of 32 general-purpose registers (GPRs). It also provides for word and double-word operand 
loads and stores between memory and a set of 32 floating-point registers (FPRs). 


Computational instructions do not modify memory. To use a memory operand in a 
computation and then modify the same or another memory location, the memory contents 
must be loaded into a register, modified, and then written back to the target location with 
distinct instructions. 


PowerPC processors follow the program flow when they are in the normal execution state. 
However, the flow of instructions can be interrupted directly by the execution of an 
instruction or by an asynchronous event. Either kind of exception may cause one of several 
components of the system software to be invoked. 


A program references memory using the effective address computed by the processor when 
it executes a memory access or branch instruction, or when it fetches the next sequential 
instruction. 

1.3.3.1.2 Calculating Effective Addresses 


The effective address (EA) is the 32-bit address computed by the processor when executing 
a memory access or branch instruction or when fetching the next sequential instruction. 


The PowerPC architecture supports two simple memory addressing modes: 


* EA=(rAlO) + offset (including offset = 0) (register indirect with immediate index) 
¢ EA=(rAlQ) + rB (register indirect with index) 
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These simple addressing modes allow efficient address generation for memory accesses. 
Calculation of the effective address for aligned transfers occurs in a single clock cycle. 


For a memory access instruction, if the sum of the effective address and the operand length 
exceeds the maximum effective address, the storage operand is considered to wrap around 
from the maximum effective address to effective address 0). 


Effective address computations for both data and instruction accesses use 32-bit unsigned 
binary arithmetic. A carry from bit 0 is ignored. 


1.3.3.2 MPC601 Instruction Set 
The MPC601 instruction set is defined as follows. 


¢ The MPC601 implements the majority of the 32-bit instructions in the PowerPC 
architecture, and traps PowerPC instructions that it does not implement to the illegal 
instruction program exception handler for execution by a software envelope. These 
instructions are described in Appendix C, “PowerPC Instructions Not Implemented 
in MPC601.” 


¢ The MPC601 supports a number of POWER instructions that are otherwise not 
implemented in the PowerPC architecture. These are listed in Appendix B, 
“POWER Architecture Cross Reference.” Individual instructions are described in 
Chapter 10, “Instruction Set.” 


* The MPC601 implements the External Control Input Word Indexed (eciwx) and 
External Control Output Word Indexed (ecowx) instructions, which are optional in 
the PowerPC architecture definition. 


¢ Several of the instructions implemented in the MPC601 function somewhat 
differently than they are defined in the PowerPC architecture. These differences 
typically stem from design differences; for instance, the PowerPC architecture 
defines several cache control instructions specific to separate instruction and data 
cache designs. 


When executed on the MPC601, such instructions may provide a subset of the 
functions of the architected instruction or they may be no-ops. 


For a list of all PowerPC instructions and all MPC601-specific instructions, see 
Appendix A, “Instruction Set Listings.” Chapter 10, “Instruction Set,’ describes each 
instruction, indicating whether an instruction is MPC601-specific and describing any 
differences in the implementation on the MPC601. 


1.3.4 Cache Implementation 


- The following subsections describe the PowerPC cache implementation in general, and the 


MPC601-specific implementation, respectively. 


1.3.4.1 PowerPC Cache Implementation | 


PowerPC cache implementations are implementation-specific. For example, some 
PowerPC processors may have separate instruction and data caches (Harvard architecture), 
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while the MPC601 uses a unified cache. This causes PowerPC cache control instructions to 
work differently, but compatibly, when executed on the MPC60)1—for example the icbi 
(Instruction Cache Block Invalidate) instruction is treated as a no-op when executed by the 
MPC601. 


PowerPC implementations can control the following memory access modes on a page or 
block basis. 

* Write-back/write-through mode 

¢ Cache-inhibited mode 

¢ Memory coherency 


To ensure coherency among caches in a multiprocessor (or multiple caching-device) 
implementations, PowerPC processors support the MESI protocol. MESI stands for 
moditied/exclusive/shared/invalid. These four states indicate the state of the cache block as 
follows: 


* Modified—The cache block is modified with respect to system memory; that is, this 
cache block holds the only valid data for this address. 


¢ Exclusive—This cache block holds valid data that is identical to the data at this 
address in system memory. No other cache has this data. 


¢ Shared—tThis cache block holds valid data that is identical to this address in system 
memory and at least one other caching device. 


¢ Invalid—tThis cache block does not hold valid data. 


Note that in the MPC601 processor, a block is defined as an eight-word sector. The 
PowerPC virtual environment architecture also defines a set of cache control instructions. 


1.3.4.2 MPC601 Cache Implementation 


The MPC6()1 has a 32-Kbyte, eight-way set-associative unified (instruction and data) 
cache. The cache is physically addressed and can operate in either write-back or write- 
through mode. Either memory update policy can be selected on a per-page or per-block 
basis. 


The cache is configured as eight sets of 64 lines. Each line consists of two sectors, four state 
bits (two per sector), and an address tag. The two state bits implement the four-state MES] 
(modified-exclusive-shared-invalid) protocol. Each sector contains eight 32-bit words. 
Note that the PowerPC architecture defines the term block as the cacheable unit. For the 
MPC6()1 processor, the block is a sector. A block diagram of the cache organization is 
shown in Figure 1-4. 


Each cache line contains 16 contiguous words from memory that are loaded from a 
16-word boundary (that is, bits A26—-A31 of the logical addresses are zero); thus, a cache 
line never crosses a page boundary. Misaligned accesses across a page boundary can incur 
a performance penalty. 
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Cache operations are always performed on a sector basis (that is, the cache is snooped and 
updated and coherency is maintained on a per-sector basis). However, if the other sector in 
the line is marked invalid, an optional, low-priority update of that sector is attempted after 
the sector that contained the critical word is filled. This function can be disabled. An LRU 
algorithm is used to select the cache line. 


External bus transactions that load instructions or data into the cache always transfer the 
missed quad word first, regardless of its location in a cache sector; then the rest of the cache 
sector is filled. As the missed quad word is loaded into the cache, it is simultaneously 
forwarded to the appropriate execution unit so instruction execution resurnes as quickly as 
possible. 


Cache coherency is enforced by on-chip hardware bus snooping logic. Since the cache tag 
directory has a separate port dedicated to snooping bus transactions, bus snooping traffic 
does not interfere with processor access to the cache unless a snoop hit occurs. 
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1.3.5 Exception Model 


The following subsections describe the PowerPC exception model and the MPC601 
implementation, respectively. 
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1.3.5.1 PowerPC Exception Model 


The PowerPC exception mechanism allows the processor to change to supervisor state as 
a result of external signals, errors, or unusual conditions arising in the execution of 
instructions. When exceptions occur, information, such as the instruction that should be 
executed after control is returned to the original program and the contents of the machine 
state register, is saved to the save/restore registers (SRRO and SRR1), program control 
passes from user to supervisor level, and software continues execution at an address 
(exception vector) predetermined for each exception. 


Although multiple exception conditions can map to a single exception vector, the specific 
condition can be determined by examining a register associated with the exception—for 
example, the DAE/source instruction service register (DSISR) and the floating-point status 
and control register (FPSCR). Additionally, specific exception conditions can be explicitly 
enabled or disabled by software. 


Although the PowerPC architecture supports out-of-order instruction dispatch, exceptions 
are handled in program order; therefore, while exception conditions may be recognized out 
of order, they are handled strictly in order. When an instruction-caused exception is 
recognized, any unexecuted instructions that appear earlier in the instruction stream, 
including any that have not yet entered execute state, are allowed to complete. Any 
exceptions, caused by those instructions are handled in order. Likewise, exceptions that are 
asynchronous and precise are recognized when they occur, but are not handled until all 
instructions currently in execute stage successfully complete execution and report their 
results. 


Unless a catastrophic condition causes a system reset or machine check exception, only one 
exception is handled at a time. If, for example, a single instruction encounters multiple 
exception conditions, those conditions are encountered sequentially. After the exception 
handler handles an exception, the instruction execution continues until the next exception 
condition is encountered. This method of recognizing and handling exception conditions 
sequentially guarantees that exceptions are recoverable. 


Exception handlers should save the information saved in SRRO and SRR1 early to prevent 
the program state from being lost due to a system reset and machine check exception or to 
an instruction-caused exception in the exception handler. 
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The PowerPC architecture supports four types of exceptions: 


¢ Synchronous, precise—These are caused by instructions. All instruction-caused 
exceptions are handled precisely; that is, the machine state at the time the exception 
occurs is known and can be completely restored. This means that the precise address 
of the faulting instruction is provided to the exception handler and that neither the 
faulting instruction nor subsequent instructions in the code stream will complete 
execution. 


¢ Synchronous, imprecise mode—The PowerPC architecture permits the 
implementation of imprecise floating-point exceptions. The use of recoverable and 
nonrecoverable versions of this mode can be enabled or disabled be setting one of 
the FEO and FE! bits in the MSR. Note that in the MPC601, these bits are internally 
ORed together causing all floating-point exceptions to be handled precisely. 


¢ Asynchronous, precise—The external interrupt and decrementer exceptions are 
maskable asynchronous exceptions that are handled precisely. When these 
exceptions occur, their handling is postponed until all instructions, and any 
exceptions associated with those instructions, complete execution. 


¢ Asynchronous, imprecise—There are two non-maskable asynchronous exceptions 
that are imprecise: system reset and machine check exceptions. These exceptions 
may not be recoverable, or may provide a limited degree of recoverability for 
diagnostic purpose. 


The PowerPC architecture defines several of the exceptions differently than the MPC601 
implementation. For example, the PowerPC exception model provides a unique vector for 
the trace exception; the MPC60)1 vectors trace exceptions to the run-mode exception 
handler. Other differences are noted in the following section, Section 1.3.5.2, “MPC601 
Exception Model.” 


1.3.5.2 MPC601 Exception Model 

All MPC601 exceptions can be described as either precise or imprecise and either 
synchronous or asynchronous. Asynchronous exceptions are caused by events external to 
the processor’s execution; synchronous exceptions, which are all handled precisely by the 
MPC601, are caused by instructions. 


The MPC601 exception classes are shown in Table 1-1. 


Table 1-1. MPC601 Exception Classifications 
Synchronous/Asynchronous Precise/Imprecise Exception Type 
Asynchronous Imprecise Machine Check 

System Reset 
Asynchronous External interrupt 
Decrementer 


Synchronous Precise instruction-caused exceptions 
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Although exceptions have other characteristics as well, such as whether they are maskable 
or nonmaskable, the distinctions shown in Table 1-1 define categories of exceptions that the 
MPC6(0)1 handles uniquely. Note that Table 1-1 includes no synchronous imprecise 
instructions. While the PowerPC architecture supports imprecise handling of floating-point 
exceptions, this functionality is not implemented in the MPC601. 


The MPC601’s exceptions, and conditions that cause them, are listed in Table 1-2. 
Exceptions that are specific to the MPC6()1 are indicated. 


Table 1-2. Exceptions and Conditions 


Exception Vector Offset : as 
00100. Asystem reset is caused by the assertion of either SRESET or HRESET. 


Machine check | 00200 Amachine check is caused by the assertion of the TEA signal. 


Data access 00300 The cause of a data access exception can be determined by the bit settings in 

ie DSISR, listed as follows: 
Set if the translation of an attempted access is not found in the primary 
hash table entry group (HTEG), or in the rehashed secondary HTEG, or in 
the range of a BAT register; otherwise cleared. 
Set if a memory access is not permitted by the page or BAT protection 
mechanism described in Chapter 6, “Memory Management Unit”; otherwise 
cleared. 
Set if the access was to an I/O segment (SR[T] =1) by a load/store with 
reservation instruction; otherwise cleared. 
Set for a store operation and cleared for a load operation. 
Set if an EA matches the address in the DABR while in one of the three 
compare modes. 

11 Set if eciwx or ecowx is used and EAR[E] is cleared. 


Instruction An instruction access exception is caused when an instruction fetch cannot be 
access performed for any of the following reasons: 
¢ The effective address cannot be translated. That is, there is a page fault for 
this portion of the translation, so an instruction access exception must be 
taken to retrieve the translation from a storage device such as a hard disk 
drive. 
The fetch access is to an I/O segment. 
The fetch access violates memory protection. If the K bits in the segment 
register and the PP bits in the PTE are set to prohibit read access, 
instructions cannot be fetched from this location. 


00500 An external interrupt occurs when the INT signal is asserted. 
interrupt 
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Table 1-2. Exceptions and Conditions (Continued) 
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Floating-point 00800 
unavailable 



































An alignment exception is caused when the MPC601 cannot perform a memory 

access for one of the following reasons: 

¢ The operand of a floating-point load or store or load/store with reservation 
operation is in an I/O segment (SR[T]=1). 

* An Iscbx instruction crosses a page boundary. 

* The operand of a load or store (including string loads and stores) crosses a 
protection boundary. 

* The operand of an Imw or stmw instruction crosses a segment or BAT 
boundary. 

* The operand of a Data Cache Block Set to Zero (dcbz) instruction is in a 
page specified as write-through or cache-inhibited for a page-address 
translation access. 





Aprogram exception is caused by one of the following exception conditions, 

which correspond to bit settings in SRR1 and arise during execution of an 

instruction: 

* Floating-point enabled exception—A floating-point enabled exception 
condition is generated when the following condition is met: 

(MSR[FEOQ] | MSR[FE1]) & FPSCR[FEX] is 1. 

FPSCR[FEX] is set by the execution of a floating-point instruction that 
causes an enabled exception or by the execution of a “move to FPSCR” 
instruction that results in both an exception condition bit and its 
corresponding enable bit being set inthe FPSCR. 

¢ Illegal instruction—An illegal instruction program exception is generated 
when execution of an instruction is attempted with an illegal opcode or illegal 
combination of opcode and extended opcode fields, or when execution of an 
optional instruction not provided in the MPC601 is attempted (these do not 
include those optional instruction that are treated as no-ops). 

¢ Privileged instruction—A privileged instruction type program exception is 
generated when the execution of a privileged instruction is attempted and the 
MSR register user privilege bit, MSR[PR], is set. In the MPC601, this 
exception is generated for mtspr or mfspr with an invalid SPR field if 
SPR[0]=1 and MSR[PR]=1. This may not be true for all PowerPC 
processors. 

* Trap—A trap type program exception is generated when any of the 
conditions specified in a trap instruction is met. 

¢  |ilegal operations—The MPC601 takes illegal operation program exceptions 

for unimplemented PowerPC instructions. The PowerPC instruction set is 

described in Chapter 3, “Addressing Modes and Instruction Set Summary.” 














A floating-point unavailable exception is caused by an attempt to execute a 
floating-point instruction (including floating-point load, store, and move 
instructions) and the floating-point available bit is disabled, (MSR[FP]=0). 


Decrementer 00900 The decrementer exception occurs when the most significant bit of the 
decrementer (DEC) register transitions from 0 to 1. 


/O error OO0A00 An I/O error exception is taken only when an operation to an I/O segment fails 
(such a failure is indicated to the MPC601 by a particular bus reply packet). If an 
/O error exception is taken on a memory access directed to an I/O segment, the 
SRRO contains the address of the instruction following the offending instruction. 
Note that this exception may not be implemented in other PowerPC processors. 


System call 00C00 Asystem call exception occurs when a System Call (sc) instruction is executed. 
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Table 1-2. Exceptions and Conditions (Continued) 


Exception Vector Offset . oe 
Reserved 00E00 Other PowerPC processors may use this vector for floating-point assist 
exceptions. 


01000-02FFF | Reserved, implementation-specific 


Run mode The run mode exception is taken depending on the settings of the HID1 register 

exception and the MSR[SE] bit. 

The following modes correspond with bit settings in the HID1 register: 

« Normal run mode—no address break points are specified, and the MPC601 
executes from Zero to three instructions per cycle 

¢ Single instruction step mode—One instruction is processed at a time. The 
appropriate break action is taken after an instruction is executed and the 
processor quiesces. 

¢ Limited instruction address compare—The MPC601 runs at full speed (in 
parallel) until the EA of the instruction being decoded matches the EA 
contained in HID2. Addresses for branch instructions and floating-point 
instructions may never be detected. 

The following mode is taken when the MSR[SE] bit is set. 

« MSRI[SE] trace mode—Note that in other PowerPC implementations, the 

trace exception is a separate exception with its own vector x'00D00’. 


1.3.6 Memory Management 


The following subsections describe the PowerPC memory management in general, and the 
specific MPC601 implementation, respectively. 




























1.3.6.1 PowerPC Memory Management 

The primary functions of the MMU are to translate logical (effective) addresses to physical 
addresses for memory accesses, I/O accesses (most I/O accesses are assumed to be 
memory-mapped), and I/O controller interface accesses, and to provide access protection 
on blocks and pages of memory. 


There are three types of accesses generated by the MPC601 that require address translation: 
instruction accesses, data accesses to memory generated by load and store instructions, and 
I/O controller interface accesses generated by load and store instructions. 


The PowerPC MMU and exception model support demand-paged virtual memory. Virtual 
memory management permits execution of programs larger than the size of physical 
memory; demand-paged implies that individual pages are loaded into physical memory 
from backing storage only when they are first accessed by an executing program. 


PowerPC memory management differs for 32- and 64-bit implementations. Address 
translations are enabled by setting bits in the MSR—MSR[IT] enables instruction 
translations and MSR[DT] enables data translations. 
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1.3.6.2 MPC601 Memory Management 


The MPC601 MMU provides 4-Gbytes of logical address space accessible to supervisor 
and user programs with a 4-Kbyte page size and 256-Mbyte segment size. Block sizes 
range from 128 Kbyte to 8 Mbyte and are software selectable. In addition, the MPC601 
uses an interim 52-bit virtual address and hashed page tables in the generation of 32-bit 
physical addresses. 


A UTLB provides address translation in parallel with the on-chip cache access, incurring 
no additional time penalty. The UTLB is a 256-entry, two-way set-associative cache that 
contains instruction and data address translations. The MPC6()1 provides hardware table 
search capability on UTLB misses. Supervisor software can invalidate UTLB entries (both 
in the set) selectively. In addition, UTLB control instructions can optionally be broadcast 
on the external interface for remote invalidations. 


The MPC601 also provides a four-entry BTLB that maintains address translations for 
blocks of memory. These entries define blocks that can vary from 128 Kbytes to 8 Mbytes. 
The BTLB is maintained by system software. 


To accelerate the instruction unit operation, the MPC601 uses a four-entry ITLB. The ITLB 
contains up to four copies of the most recently used instruction address translations (page 
or block) providing the instruction unit access to the most recently used translations without 
requiring the UTLB or BTLB. The ITLB, including coherency, is maintained in hardware 
and uses an LRU replacement algorithm. 


The MPC601 has a high-bandwidth, 64-bit data bus and a 32-bit address bus. The MPC601 
interface protocol allows multiple masters to compete for system resources through a 
central external arbiter. Additionally, on-chip snooping logic maintains cache coherency in 
multiprocessor applications. The MPC601 supports single-beat and burst data transfers for 
memory accesses; it also supports both memory-mapped I/O and I/O controller interface 
addressing. 


The MPC601 MMU relies on the exception processing mechanism for the implementation 
of the paged virtual memory environment and for enforcing protection of designated 
memory areas. Exception processing is described in Chapter5, “Exceptions.” 
Section 2.3.1, “Machine State Register (MSR),” describes the MSR of the MPC601, which 
controls some of the critical functionality of the MMU. 


The hashed page table is a variable-sized data structure that defines the mapping between 
virtual page numbers and physical page numbers. The page table size is a power of 2, and 
its starting address is a multiple of its size. 


The page table contains a number of page table entry groups (PTEGs). A PTEG contains 
eight page table entries (PTEs) of eight bytes each; therefore each PTEG is 64 bytes long. 
PTEG addresses are entry points for table search operations. Figure 6-16 shows two PTEG 
addresses (PTEGaddr1 and PTEGaddr2) where a given PTE may reside. 
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1.3.7 Instruction Timing 


The PowerPC architecture is designed to minimize instruction latencies while maximizing 
overall instruction throughput. Although many of the instructions execute in a single clock 
cycle, in many cases overall instruction throughput is significantly greater than one 
instruction per clock cycle. Because the PowerPC architecture can be applied to such a 
wide variety of implementations, instruction timing details vary accordingly. 


The MPC601 processor has been designed to minimize average instruction execution 
latency. Latency is defined as the number of clock cycles necessary to execute an 
instruction and make ready the results of that execution for a subsequent instruction. For 
the majority of instructions in the MPC601, this can be simplified to include only the 
execute phase for a particular instruction. However, data access instructions require 
additional clock cycles between the execute phase and the writeback phase due to memory 
latencies. 


In accordance with this definition, logical, bit-field, and most integer instructions have a 
latency of one clock cycle (for example, results for these instructions are ready for use on 
the next clock cycle after issue). Other instructions, such as the integer multiply, require 
more than one clock cycle to complete execution. 


Effective throughput of more than one instruction per clock cycle can be realized by the 
many performance features in the MPC6()1 including pipelining, superscalar instruction 
issue, branch acceleration, and multiple execution units that operate independently and in 
parallel. 


Many of the execution units on the MPC601 are said to be pipelined. This implies that the 
particular execution unit is broken into stages. Each stage performs a specific step, which 
contributes to the overall execution of an instruction. The pipelined design is analogous to 
an assembly line where workers perform a specific task and pass the partially complete 
product to the next worker. 


When an instruction is issued to a pipelined execution unit, the first stage in the pipeline 
begins its designated work on that instruction. As an instruction is passed from one stage in 
the pipeline to the next, evacuated stages may accept new instructions. This design allows 
a single execution unit to be working on several different instructions simultaneously. Once 
the pipeline has been filled with instructions, the execution unit completes a multi-cycle 
instruction every clock. 


If the number of stages in each pipeline is equal to the total latency in clock cycles of its 
respective execution unit, the processor can continuously issue instructions to the same 
execution unit without stalling. Thus, when enough instructions have been issued to an 
execution unit to fill its pipeline, the first instruction will have completed execution and 
exited the pipeline, allowing subsequent instructions to be issued into the tail of the pipeline 
without interruption. 
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1.3.8 System Interface 


The system interface is specific for each PowerPC processor; however, processor designs 
provide the same basic set of signals, with differences depending largely upon other design 
factors. 


The MPC601 provides a versatile system interface that allows for a wide range of 
implementations. The interface includes a 32-bit address bus, a 64-bit data bus, and 52 
control and information signals (see Figure 1-5). The system interface allows for address- 
only transactions as well as address and data transactions. The MPC601 control and 
information signals include the address arbitration, address start, address transfer, transfer 
attribute, address termination, data arbitration, data transfer, data termination, and 
processor state signals. Test and control signals provide diagnostics for selected internal 
circuitry. 


ADDRESS 
ADDRESS ARBITRATION DATA ARBITRATION 
ADDRESS START DATA TRANSFER 
ADDRESS TRANSFER MPC601 DATA TERMINATION 
TRANSFER ATTRIBUTE PROCESSOR STATE 
ADDRESS TERMINATION TEST AND CONTROL 
CLOCKS 


DATA 


Processor 





43.6V = 
Figure 1-5. System Interface 


The system interface supports bus pipelining, which allows the address tenure of one 
transaction to overlap the data tenure of another. The extent of the pipelining depends on 
external arbitration and control circuitry. Similarly, the MPC601 supports split-bus 
transactions for systems with multiple potential bus masters—one device can have 
mastership of the address bus while another has mastership of the data bus. Allowing 
multiple bus transactions to occur simultaneously increases the available bus bandwidth for 
other activity and as a result, improves performance. 


The MPC601 supports multiple masters through a bus arbitration scheme that allows 
various devices to compete for the shared bus resource. The arbitration logic can implement 
priority protocols, such as fairness, and can park masters to avoid arbitration overhead. The 
MESI protocol ensures coherency among multiple devices and system memory. Also, the 
MPC601's on-chip cache and UTLBs and optional second-level caches can be controlled 
externally. Software support for atomic memory operations minimizes the effects of data 
dependencies in multiple processor implementations. 
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The MPC601 clocking structure allows the processor to operate at an integer multiple of 
the bus frequency. 


The following sections describe the MPC601 bus support for memory and I/O controller 
interface operations. Note that some signals perform different functions depending upon 
the addressing protocol used. 


1.3.8.1 Memory Accesses 

Memory accesses allow transfer sizes of 8, 16, 24, 32, 40, 48, 56, or 64 bits in one bus clock 
cycle. Data transfers occur in either single-beat transactions or four-beat burst transactions. 
A single beat transaction transfers as much as 64 bits. Single-beat transactions are caused 
by non-cached accesses that access memory directly (that is, reads and writes when caching 
is disabled, cache-inhibited accesses, and stores in write-through mode). Burst transactions, 
which always transfer an entire cache sector (32 bytes), are initiated when a sector in the 
cache is read from or written to memory. Additionally, the MPC601 supports address-only 
transactions used to invalidate entries in other processors’ TLBs and caches. 


1.3.8.2 I/O Controller Interface Operations 

Both memory and I/O accesses can use the same bus transfer protocols. The MPC601 also 
has the ability to define memory areas as I/O controller interface areas. Accesses to the I/O 
controller interface redefine the function of some of the address transfer and transfer 
attribute signals and add control to facilitate transfers between the MPC601 and specific I/O 
devices. I/O controller interface transactions provide multiple transaction operations for 
variably-sized data transfers (1 to 128 bytes) and support a split request/response protocol. 
The distinction between the two types of transfers is made with separate signals—TS for 
memory-mapped accesses and XATS for I/O controller interface accesses. Refer to 
Chapter 9, “System Interface Operation,” for more information. 


1.3.8.3 MPC601 Signals 
The MPC601 signals are grouped as follows: 


¢ Address arbitration signals—The MPC601 uses these signals to arbitrate for address 
bus mastership. 


¢ Address transfer start signals—These signals indicate that a bus master has begun a 
transaction on the address bus. 


e Address transfer signals—These signals, which consist of the address bus, address 
parity, and address parity error signals, are used to transfer the address and to ensure 
the integrity of the transfer. 

¢ Transfer aliribuie signais—These signals provide information about the type of 
transfer, such as the transfer size and whether the transaction is bursted, write- 
through, or cache-inhibited. 


¢ Address transfer termination signals—These signals are used to acknowledge the 
end of the address phase of the transaction. They also indicate whether a condition 
exists that requires the address phase to be repeated. 
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¢ Data arbitration signals—The MPC601 uses these signals to arbitrate for data bus 
mastership. 


¢ Data transfer signals—These signals, which consist of the data bus, data parity, and 
data parity error signals, are used to transfer the data and to ensure the integrity of 
the transfer. 


¢ Data transfer termination signals—Data termination signals are required after each 
data beat in a data transfer. In a single-beat transaction, the data termination signals 
also indicate the end of the tenure, while in burst accesses, the data termination 
signals apply to individual beats and indicate the end of the tenure only after the final 
data beat. They also indicate whether a condition exists that requires the data phase 
to be repeated. 


¢ System status signals—These signals include the interrupt signal, checkstop signals, 
and both soft- and hard-reset signals. These signals are used to interrupt and, under 
various conditions, to reset the processor. 


¢ Processor state signals—These two signals are used to set the reservation coherency 
bit and set the size of the MPC601’s output buffers. 


¢ Miscellaneous signals—These signals provide information about the state of the 
reservation coherency bit and the size of the MPC601's output buffers. 


¢ COP interface signals—The common on-chip processor (COP) unit is the master 
clock control unit and it provides a serial interface to the system for performing 
built-in self test (BIST). 


¢ Test interface signals—These signals are used for internal testing. 


¢ Clock signals—These signals determine the system clock frequency. These signals 
can also be used to synchronize multiprocessor systems. 


NOTE 


A bar over a signal name indicates that the signal is active 
low—for example, ARTRY (address retry) and TS (transfer 
start). Active-low signals are referred to as asserted (active) 
when they are low and negated when they are high. Signals that 
are not active-low, such as APO-AP3 (address bus parity 
signals) and TTO-TT4 (transfer type signals) are referred to as 
asserted when they are high and negated when they are low. 


1.3.8.4 Signal Configuration 


Figure 1-6 illustrates the MPC601 microprocessor's pin configuration, showing how the 
signals are grouped. 
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Figure 1-6. MPC601 Signal Groups 


1.3.8.5 Real-Time Clock Facility 


The real-time clock (RTC) facility, which is specific to the MPC601, provides a high- 
resolution measure of real time to provide time of day and date with a calendar range of 
136.19 years. The RTC consists of two registers—the RTC upper (RTCU) register and the 
RTC lower (RTCL) register. The RTCU register maintains the number of seconds from a 
point in time specified by software. The RTCL register counts nanoseconds. The contents 
of either register may be copied to any GPR. 
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Chapter 2 
Registers and Data Types 


This chapter describes the MPC6()1’s register organization, how these registers are 
accessed, and how data is formatted in these registers. 


The MPC6()1 always operates in one of three distinct states—teset state, checkstop state, 
and normal execution state, which includes both user- and supervisor-level operations. The 
three states are described as follows: 


* Reset state—In the reset state all processor instruction execution is aborted, registers 
are initialized appropriately, and external signals are placed in the high-impedance 
State. 


¢ Normal instruction execution state—When the MPC601 is in the normal execution 
state, it operates at one of two privilege levels—user mode, and supervisor mode, 
which can be accessed when an exception is taken. This access privilege determines 
which instructions and which of the registers software can access. 


* Checkstop state—When a processor is in the checkstop state, instruction processing 
is suspended and generally cannot be restarted without resetting the processor. The 
checkstop state is provided to help diagnose problems. 


The PowerPC architecture defines register-to-register operations for all computational 
instructions. Source operands for these instructions are accessed from the on-chip registers 
or are provided as immediate values embedded in the opcode. The three-register instruction 
format allows specification of a different target register from the two source registers, thus 
preserving the original data for use by other instructions and reducing the number of 
instructions required for certain operations. Data is transferred between memory and 
registers with explicit load and store instructions only. 


2.1 Normal Instruction Execution State 


During normal execution, a program can access the registers, shown in Figure 2-1, 
depending on the program’s access privilege (supervisor or user, determined by the 
privilege-level (PR) bit in the machine state register (MSR)). Note that registers such as the 
general-purpose registers (GPRs) and floating-point registers (FPRs) are accessed through 
operands that are part of the instructions. Access to registers can be explicit (that is, through 
the use of specific instructions for that purpose such as Move to Special-Purpose Register 
(mtspr) and Move from Special-Purpose Register (mfspr) instructions) or implicitly as the 
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part of the execution of an instruction. Some registers are accessed both explicitly and 
implicitly. 


The numbers to the left of the SPRs indicate the number that is used in the syntax of the 
instruction operands to access the register. 


SUPERVISOR PROGRAMMING SPR528 









°o 


FPRO 
FPR1 


FPR31 
0 63 


USER PROGRAMMING User-Level SPRs 

MODEL SPRO 
SPRI 
SPR4 
SPR5 
SPR8 
SPR9Q 












Supervisor-Level SPRs 
Condition 


Register 







SPR18 
SPR19 
SPR22 
SPR25 
SPR26 
SPR27 
SPR272 
SPR273 
SPR274 
SPR275 
SPR282 
SPR286 


Floating Point 
Status and 
Control 
Register 


FPSCR 


Oo 


3 


= 


MODEL SPR529 
SPR530 

SPR531 

Segment SPR532 

Registers SPR533 

SRO SPR534 

SR1 SPR535 


Machine State SPR1008 
Register * SPR1009 


31.20 31 SPR1013 
SPR1023 


HID5 (DABR) ' 


0 


' MPC601-only registers. These registers may not be supported by other PowerPC processors. 
2 These registers are implemented differently on other PowerPC processors. 
3 The RTCU and RTCL registers can only be written in supervisor mode. 


tee OF ty ashen Seas & Ba ean 31 


BAT1U—Instruction BAT 1 Upper 2 
BAT1L—Instruction BAT 1 Lower 2 
BAT2U—Instruction BAT 2 Upper 2 
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31 


4The DEC can be read by user programs by specifying SPR6 in the mfspr instruction (for POWER compatibility) 


Figure 2-1. Programming Model—Registers 
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The following paragraphs discuss the MPC60)1’s user- and supervisor-level registers. 


¢ User-level registers—tThe user-level registers can be accessed by all software with 
either user or supervisor privileges. These include the following: 


— General-purpose registers (GPRs). The general-purpose register file consists of 
thirty-two, 32-bit GPRs designated as GPRO-GPR31. This register file serves as 
the data source or destination for all integer instructions and provides addresses 
for all memory-access instructions. See Section 2.2.1, “General Purpose 
Registers (GPRs),” for more information. 


— Floating-point registers (FPRs). The floating-point register file consists of thirty- 
two, 64-bit FPRs designated as FPRO—-FPR31, which serve as the data source or 
destination for all floating-point instructions. These registers can contain data 
objects of either single- or double-precision floating-point formats. The floating- 
point register file is part of the FPU. For more information, see Section 2.2.2, 
“Floating-Point Registers (FPRs).” 


— Floating-point status and control register (FPSCR). The FPSCR is a user-control 
register in the FPU. It contains all floating-point exception signal bits, exception 
summary bits, exception enable bits, and rounding control bits needed for 
compliance with the IEEE 754 standard. For more information, see 
Section 2.2.3, “Floating-Point Status and Control Register (FPSCR).” 


— Condition register (CR). The condition register is a 32-bit register, divided into 
eight 4-bit fields, CRO—CR7, that reflects the result of certain arithmetic 
operations and provides a mechanism for testing and branching. For more 
information, see Section 2.2.4, “Condition Register (CR).” 


The remaining user-level registers are SPRs. Note however that while the PowerPC 
architecture provides a separate mechanism for accessing SPRs, this mechanism is 
not the usual method for accessing user-level SPRs. 


— MQ register (MQ). The MQ register is a MPC601-specific, 32-bit register used 
as aregister extension to accommodate the product for the multiply instructions 
and the dividend for the divide instructions. It is also used as an operand of long 
rotate and shift instructions. This register is provided for compatibility with 
POWER architecture, and is not part of the PowerPC architecture. For more 
information, see Section 2.2.5.1, “MQ Register (MQ).” The MQ register is 
typically accessed implicitly as part of executing a computational instruction. 


— Integer exception register (XER). The XER is a 32-bit register that indicates such 
things as overflow and carries for integer operations. For more information, see 
Section 2.2.5.2, “Integer Exception Register (XER).” 


— Real-time clock (RTC) registers—RTCU and RTCL (RTC upper and RTC 
lower). The RTCU register maintains the number of seconds from a time 
specified by software. The RTCL register maintains a fraction of the current 
second in nanoseconds. The contents of either register can be copied to any GPR. 
These registers are specific to the MPC601. These registers are not supported in 
the PowerPC architecture, which uses the time base facility rather than a separate 
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real-time clock. For more information, see Section 2.2.5.3, “Real-Time Clock 
(RTC) Registers.” 


— Link register (LR). The 32-bit link register provides the branch target address for 
the Branch Conditional to Link Register (belrx) instruction, and can optionally 
be used to hold the logical address of the instruction that follows a branch and 
link instruction. Although this is an SPR, it is not typically accessed through the 
PowerPC’s SPR mechanism. For more information, see Section 2.2.5.4, “Link 
Register (LR).” 


— Count register (CTR). The count register is a 32-bit register for holding a loop 
count that can be decremented during execution of appropriately coded branch 
instructions. The CTR can also provide the branch target address for the Branch 
Conditional to Count Register (bectrx) instruction. Although this is an SPR, it is 
not typically accessed through the PowerPC’s SPR mechanism. For more 
information, see Section 2.2.5.5, “Count Register (CTR).” 


Supervisor-level registers—The MPC601 incorporates registers that can be 
accessed only by programs executed with supervisor privileges. These registers 
consist of the machine state register, segment registers, and supervisor SPRs, 
described as follows: 


— The machine state register (MSR), shown in Figure 2-11, is a 32-bit register that 
defines the state of the processor. The MSR can be modified by the Move to 
Machine State Register (mtmsr), System Call (sc), and Return from Exception 
(rfi) instructions. It can be read by the Move from Machine State Register 
(mfmsr) instruction. Note that in other PowerPC implementations, the MSR is a 
64-bit register. 


— Segment registers. The sixteen 32-bit segment registers are present only in 32-bit 
PowerPC implementations. Figure 2-12 and Figure 2-13 show the format of a 
segment register. The fields in the segment register are interpreted differently 
depending on the value of bit 0. 


The remaining supervisor-level registers are SPRs: 


— The 32-bit DAE/source instruction service register (DSISR) defines the cause of 
data access and alignment exceptions; see Figure 2-14. For more information, 
see Section 2.3.3.2, “DAE/Source Instruction Service Register (DSISR).” 


— The data address register (DAR) is a 32-bit register shown in Figure 2-15. After 
a data access or an alignment exception, DAR is set to the effective address of a 
load or store element. For more information, see Section 2.3.3.3, “Data Address 
Register (DAR).” 


— The decrementer register (DEC) is a 32-bit decrementing counter that provides 
a mechanism for causing a decrementer exception after a programmable delay. 
In the MPC601, the RTC provides the frequency for the DEC. In other PowerPC 
implementations, the frequency is a subdivision of the processor clock. For more 
information, see Section 2.3.3.4, “Decrementer (DEC) Register.” 
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— The 32-bit table search descriptor register 1 (SDR1) specifies the page table 
variables used in virtual-to-physical address translation. For more information, 
see Section 2.3.3.5, “Table Search Descriptor Register 1 (SDR1).” 


— The machine status save/restore register 0 (SRRO) is a 32-bit register that is used 
by the MPC60)1 for saving machine status on exceptions and restoring machine 
status when an rfi instruction is executed. SRRO is shown in Figure 2-18. For 
more information, see Section 2.3.3.6, “Machine Status Save/Restore Register () 
(SRRO).” 


— The machine status save/restore register 1 (SRR1) is a 32-bit register used to 
save machine status on exceptions and to restore machine status when an rft 
instruction is executed. SRR1 is shown in Figure 2-19. For more information, 
see Section 2.3.3.7, “Machine Status Save/Restore Register | (SRR1).” 


— The general SPRs, SPRGO-SPRG3, are 32-bit registers provided for operating 
system use. See Figure 2-20. For more information, see Section 2.3.3.8, 
“General SPRs (SPRGO-SPRG3).” 


— The external access register (EAR) is a 32-bit register that controls access to the 
external control facility through the eciwx and ecowx instructions. Note that the 
EAR register and the eciwx and ecowx instructions are an optional part of the 
PowerPC architecture and may not be supported in other PowerPC processors. 
For more information about the external control facility, see Section 2.3.3.9, 
“External Access Register (EAR).” 


— The processor version register (PVR) is a 32-bit, read-only register that identifies 
the version (model) and revision level of the PowerPC processor. The contents 
of the PVR can be copied to a GPR by the Move from Special Purpose Register 
(mfspr) instruction. For more information, see Section 2.3.3.10, “Processor 
Version Register (PVR).” 


— Block-address translation (BAT) registers. The MPC601 includes eight block- 
address translation registers (BATs), consisting of four pairs of BATs (BATOU-— 
BAT3U and BATOL-—BAT3L). See Figure 2-1 for a list of the SPR numbers for 
the BAT registers. Figure 2-23 and Figure 2-24 show the formats of the upper 
and lower BAT registers. Note that other PowerPC implementations have two 
sets of four BAT pairs—four sets of upper and lower IBATs (which occupy the 
space of the unified BATs in the MPC601) and four sets of upper and lower 
DBATs (located in the subsequent eight positions at SPR numbers 536-543). 


— The hardware implementation registers, HIDO—HID2, HID5, and HID15 are 
provided primarily for debugging. For more information, see Section 2.3.3.12.1, 
“Checkstop Sources and Enables Register—HIDO” through Section 2.3.3.12.5, 
“Processor Identification Register (PIR)—HID15.” HID15 holds the four-bit 
processor identification tag (PID) that is useful for differentiating processors in 
multiprocessor system designs, For more information, see Section 2.3.3.12.5, 


“Processor Identification Register (PIR)—HID15.” 
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Note that there are registers common to other PowerPC processors not implemented in the 
MPC601. When the MPC601 detects SPR encodings other than those defined in this 
document, it either takes a program exception (if bit 0 of the SPR encoding is set) or it treats 
the instruction as a no-op (if bit 0 of the SPR encoding is clear). 


2.1.1 Changing Privilege Levels 


During normal instruction execution, the processor operates using either user- or 
supervisor-level instructions and registers. Supervisor-level access is provided through the 
MPC601’s exception mechanism. That is, when an exception is taken, either due to an error 
or problem that needs to be serviced or deliberately through the use of a trap instruction, 
the processor begins operating in supervisor mode. The level of access is indicated by the 
privilege-level (PR) bit in the MSR. 


2.2 User-Level Registers 


This section describes in detail the registers that can be accessed by user-level software. All 
user-level registers can be accessed by supervisor-level software. 


2.2.1 General Purpose Registers (GPRs) 


Integer data is manipulated in the IU's thirty-two 32-bit GPRs shown in Figure 2-2. These 
registers are accessed as source and destination registers through operands in the 
instruction syntax. 





Figure 2-2. General Purpose Registers (GPRs) 


All GPRs are cleared by hard reset. 


2.2.2 Floating-Point Registers (FPRs) 


The PowerPC architecture provides thirty-two, 64-bit FPRs as shown in Figure 2-3. These 
registers are accessed as source and destination registers through operands in floating-point 
instructions. Each FPR supports the double-precision, floating-point format. Every 
instruction that interprets the contents of an FPR as a floating-point value uses the double- 
precision floating-point format for this interpretation. 
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All floating-point arithmetic instructions operate on data located in FPRs and, with the 
exception of the compare instructions, place the result into an FPR. Information about the 
status of floating-point operations is placed into the floating-point status and control 
register (FPSCR) and in some cases, into CR after the completion of the operation’s final 
writeback stage. For information on how CR is affected for floating-point operations, see 
Section 2.2.4, “Condition Register (CR).” 


Load and store double instructions are provided that transfer 64 bits of data between 
memory and the FPRs in the floating-point processor with no conversion. Load single 
instructions are provided to transfer and convert floating-point values in floating-point 
format from memory to the same value in double-precision floating-point format in the 
FPRs. Store single instructions are provided to transfer and convert floating-point values in 
double-precision floating-point format from the FPRs to the same value in single-precision 
floating-point format in memory. 


Single- and double-precision arithmetic instructions accept values from the FPRs in 
double-precision format. For single-precision arithmetic instructions, all input values must 
be representable in single-precision format; otherwise, the result placed into the target FPR 
and the setting of status bits in the FPSCR and in the condition register are undefined. 


The MPC601’s floating-point arithmetic instructions produce intermediate results that may 
be regarded as infinitely precise. After normalization or denormalization, if the precision of 
the intermediate result cannot be represented in the destination format (either 32-bit or 64- 
bit) then it must be rounded. The final result is then placed into the FPR in the double- 
precision format. : 





Figure 2-3. Floating-Point Registers (FPRs) 


All FPRs are cleared by hard reset. 


2.2.3 Floating-Point Status and Control Register (FPSCR) 


The FPSCR, shown in Figure 2-4, controls the handling of floating-point exceptions and 
records status resulting from the floating-point operations. Bits 0-23 are status bits. Bits 
24-31 are control bits. Bits in the FPSCR are updated after an operation’s final writeback 
stage. 
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The floating-point exception condition bits in the FPSCR are bits 0-12 and 21-23 and are 
sticky, except for the floating-point enabled exception summary (FEX) and floating-point 
invalid operation exception summary (VX). That is, once set sticky bits remain set until 
they are cleared by an merfs, mtfsfi, mtfsf, or mtfsb0 instruction. 


FEX and VX are the logical ORs of other FPSCR bits. Therefore these two bits are not 
listed among the FPSCR bits directly affected by the various instructions. 


FPSCR Reserved 


VXZDZ VXSOFT 
VXIMZ VXSQRT 
VXVC VXCVI 


EEC Ce eee Bees 


3 45 6 7 8 9 10 11 12 13 14 15 19 20 21 22 23 24 25 26 27 28 29 30 31 


VXIDI 
VXISI 
VXSNAN 







Figure 2-4. Floating-Point Status and Control Register (FPSCR) 
A listing of FPSCR bit settings is shown in Table 2-1. 


Table 2-1. FPSCR Bit Settings 


a 


Floating-point exception summary (FX). Every floating-point instruction implicitly sets 
FPSCRI[FX] if that instruction causes any of the floating-point exception bits in the FPSCR to 
transition from 0 to 1. The merfs instruction implicitly clears FPSCR[FX] if the FPSCR field 
containing FPSCR[FX] is copied. The mtfsf, mtfsfi, mtfsb0, and mtfsb1 instructions can set 
or clear FPSCR[FX] explicitly. This is a sticky bit. 


Floating-point enabled exception summary (FEX). This bit signals the occurrence of any of the 
enabled exception conditions. It is the logical OR of all the floating-point exception bits 
masked with their respective enable bits. The merfs instruction implicitly clears FPSCR[FEX] 
if the result of the logical OR described above becomes zero. The mtfsf, mtfsfi, mtfsb0, and 
mtfsb1 instructions cannot set or clear FPSCRIFEX] explicitly. This is not a sticky bit. 


Floating-point invalid operation exception summary (VX). This bit signals the occurrence of 
any invalid operation exception. It is the logical OR of all of the invalid operation exceptions. 
The merfs instruction implicitly clears FPSCR[VX] if the result of the logical OR described 
above becomes zero. The mtfsf, mtfsfi, mtfsb0, and mtfsb1 instructions cannot set or clear 
FPSCR[VX] explicitly. This is not a sticky bit. 


Floating-point overflow exception (OX). This is a sticky bit. See Section 5.4.7.4, “Overflow 
Exception Condition.” 


Floating-point underflow exception (UX). This is a sticky bit. See Section 5.4.7.5, “Underflow 
Exception Condition." 


Floating-point zero divide exception (ZX). This is a sticky bit. See Section 5.4.7.3, “Zero Divide 
Exception Condition." 


Floating-point inexact exception (XX). This is a sticky bit. See Section 5.4.7.6, “Inexact 
Exception Condition.” 
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Table 2-1. FPSCR Bit Settings (Continued) 


a 


VXSNAN | Floating-point invalid operation exception for SNaN (VXSNAN). This is a sticky bit. See 
















Section 5.4.7.2, “Invalid Operation Exception Conditions." 


VXISI Floating-point invalid operation exception for o-oo (VXISI). This is a sticky bit. See Section 
5.4.7.2, “Invalid Operation Exception Conditions." 

VXIDI Floating-point invalid operation exception for oo/oc (VXIDI). This is a sticky bit. See Section 
5.4.7.2, “Invalid Operation Exception Conditions." 

VXZDZ Floating-point invalid operation exception for 0/0 (VXZDZ). This is a sticky bit. See Section 
5.4.7.2, “Invalid Operation Exception Conditions." 


at wa 











Floating-point invalid operation exception for *0 (VXIMZ). This is a sticky bit. See Section 
5.4.7.2, “Invalid Operation Exception Conditions." 








Floating-point invalid operation exception for invalid compare (VXVC). This is a sticky bit. See 
Section 5.4.7.2, “Invalid Operation Exception Conditions." 











Floating-point fraction rounded (FR). The last floating-point instruction that potentially rounded 
the intermediate result incremented the fraction. (See Section 2.4.9.6, “Rounding.") This bit is 
not sticky. 





Floating-point fraction inexact (Fl). The last floating-point instruction that potentially rounded 
the intermediate result produced an inexact fraction or a disabled exponent overflow. (See 
Section 2.4.9.6, “Rounding.") This bit is not sticky. 







Floating-point result flags (FPRF). This field is based on the value placed into the target 

register even if that value is undefined. Refer to Table 2-2 for specific bit settings. 

15 Floating-point result class descriptor (C). Floating-point instructions other than the 
compare instructions may set this bit with the FPCC bits, to indicate the class of the 
result. 

16-19 Floating-point condition code (FPCC). Floating-point compare instructions always 
set one of the FPCC bits to one and the other three FPCC bits to zero. Other 
floating-point instructions may set the FPCC bits with the C bit, to indicate the class 
of the result. Note that in this case the high-order three bits of the FPCC retain their 
relational significance indicating that the value is less than, greater than, or equal to 
Zero. 

16 Floating-point less than or negative (FL or <) 
17 Floating-point greater than or positive (FG or >) 


18 Floating-point equal or zero (FE or =) 


15-19 | FPRF 
19 Floating-point unordered or NaN (FU or?) 


fe eeOOSCSCSCSCSCSC“‘“‘“‘“*S 


21 VXSOFT | Not implemented in the MPC601. Some implementations use this as the floating-point invalid 
operation exception for software request (VXSOFT). This bit can be altered only by the merfs, 

















mtfsfi, mtfsf, mtfsb0, or mtfsb1 instructions. The purpose of VXSOFT is to allow software to 
cause an invalid operation condition for a condition that is not necessarily associated with the 
execution of a floating-point instruction. For example, it might be set by a program that 
computes a square root if the source operand is negative. This is a sticky bit. See Section 
5.4.7.2, “Invalid Operation Exception Conditions." 







software can simulate fsqrt and frsqrte, and to provide a consistent interface to handle 
exceptions caused by square-root operations. See Section 5.4.7.2, “Invalid Operation 
Exception Conditions." 






22 VXSQRT | Not implemented in the MPC601. Some implementations use this as the floating-point invalid 
operation exception for invalid square root (VXSQRT). This is a sticky bit. This guarantees that 
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Table 2-1. FPSCR Bit Settings (Continued) 


Floating-point invalid operation exception for invalid integer convert (VXCVI). This is a sticky 
bit. See Section 5.4.7.2, “Invalid Operation Exception Conditions." 


Floating-point invalid operation exception enable (VE). See Section 5.4.7.2, “Invalid Operation 
Exception Conditions." 


Floating-point overflow exception enable (OE). See Section 5.4.7.4, “Overflow Exception 
Condition." 


Floating-point underflow exception enable (UE). This bit should not be used to determine 
whether denormalization should be performed on floating-point stores. See Section 5.4.7.5, 
“Underflow Exception Condition." 


Floating-point zero divide exception enable (ZE). See Section 5.4.7.3, “Zero Divide Exception 
Condition." 


Floating-point inexact exception enable (XE). See Section 5.4.7.6, “Inexact Exception 
Condition." 


Floating-point rounding control (RN). See Section 2.4.9.6, “Rounding.” 
Round to nearest 
Round toward zero 
Round toward +infinity 
Round toward - infinity 





Table 2-2 illustrates the floating-point result flags used by the MPC601. The result flags 
correspond to FPSCR bits 15-19. 


Table 2-2. Floating-Point Result Flags in FPSCR 

















Result Flags 
(Bits 15-19) 
C<>=? 


The FPSCR is cleared by hard reset. 







Result value class 
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2.2.4 Condition Register (CR) 


The condition register (CR) is a 32-bit register that reflects the result of certain operations 
and provides a mechanism for testing and branching. The bits in the CR are grouped into 
eight 4-bit fields, CRO-CR7, as shown in Figure 2-5. 


34 78 1112 


0 1516 1920 2324 27 28 31 
Figure 2-5. Condition Register (CR) 


The CR fields can be set in one of the following ways: 


¢ Specified fields of the CR can be set by a move instruction (mtcrf, or merfs) to the 
CR from a GPR. 


¢ Specified fields of the CR can be moved from one CRx field to another with the 
merf instruction. 

¢ Aspecified field of the CR can be set by a move instruction (merxr) to the CR from 
the XER. 


* Condition register logical instructions can be used to perform logical operations on 
specified bits in the condition register. 


« CRO can be the implicit result of an integer operation. 
¢ CRI can be the implicit result of a floating-point operation. 


* Aspecitied CR field can be the explicit result of either an integer or floating-point 
compare instruction. 


Instructions are provided to test individual CR bits. The condition register is cleared by 
hard reset. 


2.2.4.1 Condition Register CRO Field Definition 

In most integer instructions, when the record bit, Rc, is set, CRO is generated by an 
algebraic comparison of the result to zero. The integer arithmetic and logical instructions 
(addic., andi., and andis.) generate these four bits in CRO implicitly. These bits are shown 
in Table 2-3. In the descriptions below, the result refers to the 32-bit value placed into the 
target register. If any portion of the result is undefined, the value placed in the CRO field is 
undefined. 





MOTOROLA Chapter 2. Registers and Data Types 2-11 


Table 2-3. Bit Settings for CRO Field of CR 


; 


0." Negative (LT)—This bit is set when the result is negative. 





1 Positive (GT)—This bit is set when the result is positive (and not zero) 


ee Zero (EQ)—This bit is set when the result is zero. 
oe Summary overflow (SO)—This is a copy of the final state of XER[SO] at the completion of the instruction. 


2.2.4.2 Condition Register CR1 Field Definition 


In all floating-point instructions except merfs, fempu, and fempo, when record option is 
specified, CR1 is copied from bits 0-3 of the floating-point status and control register 
(FPSCR). For more information about the FPSCR, see Section 2.2.3, “Floating-Point 
Status and Control Register (FPSCR).” The bit settings for the CR1 field are shown in 
Table 2-4. 






Table 2-4. Bit Settings for CR1 Field of CR 


4 Floating-point exception (FX)—This is a copy of the final state of FPSCRI[FX] at the completion of the 
instruction. 
Floating-point enabled exception (FEX)—This is a copy of the final state of FPSCR[FEX] at the 
completion of the instruction. 


Floating-point invalid exception (VX)—This is a copy of the final state of FPSCR[VX] at the completion of 
the instruction. 


7 Floating-point overflow exception (OX)—This is a copy of the final state of FPSCR[OX] at the completion 
of the instruction. 


2.2.4.3 Condition Register CRn Field—Compare Instruction 

When a specified CR field is set by a compare instruction, the bits of the specified field are 
interpreted, as shown in Table 2-5. A condition register field can also be accessed by the 
mfcr, merf, and mterf instructions. | 
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Table 2-5. CRn Field Bit Settings for Compare Instructions 


Less than, Floating-point less than (LT, FL). 

For integer compare instructions, (rA) < SIMM, UIMM, or (rB) (algebraic comparison) or (rA) SIMM, 
UIMM, or (rB) (logical comparison). 

For floating-point compare instructions, (frA) < (frB). 


Greater than, floating-point greater than (GT, FG). 
For integer compare instructions, (rA) > SIMM, UIMM, or (rB) (algebraic comparison) or (rA) SIMM, 


UIMM, or (rB) (logical comparison). 
For floating-point compare instructions, (frA) > (frB). 


Equal, floating-point equal (EQ, FE). 
For integer compare instructions, (rA) = SIMM, UIMM, or (rB). 
For floating-point compare instructions, (frA) = (frB). 


Summary overflow, floating-point unordered (SO, FU). 

For integer compare instructions, this is a copy of the final state of XER[SO] at the completion of the 
instruction. 

For floating-point compare instructions, one or both of (frA) and (frB) is not a number (NaN). 


*Here, the bit indicates the bit number in any one of the four-bit subfields, CRO-CR7. 


2.2.5 User-Level SPRs 


User-level SPRs can be accessed by either user- or supervisor-level instructions. Typically, 
these registers are accessed implicitly though the encoding of the instruction rather than 
explicitly through the Move to Special Purpose Register (mtspr) and Move from Special 
Purpose Register (mfspr) instructions. Some SPRs are implementation-specific; some 
SPRs in the MPC601 may not be implemented in other PowerPC processors, or may not be 
implemented in the same way in other PowerPC processors. 


For registers with reserved bits, implementations return zeros or return the value last 
written to those bits. Table 2-6 summarizes how the MPC601 treats the undefined bits in 
the user-level SPRs. 


Table 2-6. Undefined Bits in User-Level SPRs 
F Value Returned 


Note that some SPR bits are reserved—the results of writing to and reading from these bits 
are undetined. Some of these bits are used by other PowerPC implementations. 










The RTCL register is defined as 32 bits, but the lowest-order seven bits are not 
implemented. Those bits are reserved, and zeroes are loaded into the respective bit 
positions of the target register when the RTCL is read. 
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When the MPC601 detects SPR encodings other than those defined in this document, it 
either takes a program exception (if bit 0 of the SPR encoding is set) or it treats the 
instruction as a no-op (if bit 0 of the SPR encoding is clear). 


2.2.5.1 MQ Register (MQ) 


The MQ register (MQ), shown in Figure 2-6, is a 32-bit register used as a register extension 
to accommodate the product for the multiply instruction and the dividend for the divide 
instruction. It is also used as an operand of long rotate and shift instructions. The MQ 
register is implemented on the MPC601. 


0 31 


Figure 2-6. MQ Register (MQ) 


The MQ register is not defined in the PowerPC architecture. However, in the MPC601, it 
may be modified during the execution of the mulli, mullw, mulhs, mulhu, divw, and 
divwu instructions, which are PowerPC instrutions. 


The value written to the MQ register during these operations is operand-dependent and 
therefore, the MQ contents become undefined after any of these instructions executes. In 
addition, the MQ is modified by the implementation-specific instructions supported by the 
MPC601 that are not part of the PowerPC architecture. These are listed in Table 2-7. 


Table 2-7. MPC601-Specific Instructions that Modify the MQ Register 


[teen [erste || 
Cn Co 
[aie [ester =i mt 
fag 
[ag 
ee 






[ag [Site cmansah na ————~ Reo 
[tig iterate in| Reo 
[aig [eittorg whi Resto 
fon [sitorwnwa dW 
[aia | Sai Ageicimedi wn | wie 
[ee 
[ae 
[eee 











Shift Right Extended 
Shift Right Extended Algebraic 
Shift Right Extended with MQ 
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Table 2-7. MPC601-Specific Instructions that Modify the MQ Register (Continued) 


Cie [teste | | 
fsriq Shift Right Immediate with MQ 
[stig | Shift Right Long Immediate with MQ Read/write 












Shift Right Long with MQ 
Shift Right with MQ 


The PowerPC instructions listed in Table 2-8 use the MQ register but leave it in an 
undefined state. 





Table 2-8. PowerPC Instructions that Use the MQ Register 


[none [wean 
| multi | Multiply Low Immediate 














The Move to Special Purpose Register (mtspr) and Move from Special Purpose Register 
(mfspr) can access the MQ register. The SPR number for the MQ register is 0. 





The MQ register is not part of the PowerPC architecture and will not be supported in all 
PowerPC microprocessors. The MQ register is cleared by hard reset. 


2.2.5.2 Integer Exception Register (XER) 
The integer exception register (XER) is a user-level, 32-bit register shown in Figure 2-7. 


Reserved 


Byte compare value : Byte count 





0123 15 16 23 24 25 31 
Figure 2-7. Integer Exception Register (XER) 


The SPR number for the XER is 1. The bit definitions for XER, shown in Table 2-9, are 
based on the operation of an instruction considered as a whole, not on intermediate results. 
For example, the result of the Subtract from Carrying (subfcx) instruction is specified as 
the sum of three values. This instruction sets bits in the XER based on the entire operation, 
not on an intermediate sum. 


The XER is cleared by hard reset. 
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Table 2-9. Integer Exception Register Bit Definitions 





Summary Overflow (SO)—The summary overflow bit (OV) is set whenever an instruction sets the 
overflow bit (OV) to indicate overflow and remains set until software clears it. It is not altered by 
compare instructions or other instructions that cannot overflow. 







Overflow (OV)—The overflow bit is set to indicate that an overflow has occurred during execution 
of an instruction. Integer and subtract instructions having OE=1 set OV if the carry out of bit Ois 
not equal to the carry out of bit 1, and clear it otherwise. The OV bit is not altered by compare 
instructions or other instructions that cannot overflow. 










Carry (CA)—In general, the carry bit is set to indicate that a carry out of bit 0 occurred during 
execution of an instruction. Add carrying, subtract from carrying, add extended, and subtract from 
extended instructions set CA to one if there is a carry out of bit 0, and clear it otherwise. The CA 

bit is not altered by compare instructions, or other instructions that cannot carry, except that shift 
right algebraic instructions set the CA bit to indicate whether any '1' bits have been shifted out of a 


negative quantity. 


f= [memes 
16-23 Ld This field contains the byte to be compared by a Load String and Compare Byte Indexed (Iscbx) 
instruction 











This field specifies the number of bytes to be transferred by a Load String Word Indexed (Iswx), 
Store String Word Indexed (stswx) or Load String and Compare Byte Indexed (Iscbx) instruction. 


2.2.5.3 Real-Time Clock (RTC) Registers 

The real-time clock (RTC) registers provide a high-resolution measure of real time for 
indicating the date and time of day. The RTC facility provides a calendar range of roughly 
135 years. The RTC registers are not implemented on other PowerPC processors; instead, 
other PowerPC processors use a time base which is a subdivision of the processor clock. 





The RTC input is sampled using the CPU clock. Therefore, if the CPU clock is less than 
twice the RTC frequency, real-time clock (and decrementer) sampling and incrementing 
errors will occur. Therefore, in systems that change the CPU clock frequency dynamically 
beyond this limit, a method of saving and restoring the real-time clock register values via 
external means is required. 


The RTC registers, shown in Figure 2-8, consist of the following: 


¢ Real-time clock upper (RTCU)—This register specifies the number of seconds that 
have elapsed since the time specified in the software. 


¢ Real-time clock lower (RTCL)—This register contains the number of nanoseconds 
since the beginning of the current second. 
Together, RTCU and RTCL provide a high-resolution measurement of real time. 


Reading any portion of the RTC registers does not affect its contents. The writing of the 
RTCU and RTCL registers is allowed for supervisor programs only (mtspr is supervisor- 
only for RTC registers) 
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0 31 
(1) 


Reserved 





12 2425 31 


0 
(2) 
Figure 2-8. Real-Time Clock (RTC) Registers 


The RTC runs constantly while power is applied. Note that the RTC will not be 
implemented in other PowerPC processors. The condition register is cleared by hard reset. 
Note however, that if an external clock is connected to the RTC, the RTCL and RTCU 
registers can change from their initial values without receiving instructions to load those 
registers. 


2.2.5.3.1 Real-Time Clock Lower (RTCL) Register 

The RTCL functions as a 23-bit counter that provides the lower word of the RTC. As an 
indicator of the granularity of the RTC, enough bits are implemented to provide a resolution 
that is finer than the time required to execute 10 Add Immediate (addi) instructions. The 
following details describe the RTCL: 


¢ Bits 0-1 and bits 25-31 are not implemented. (The number of lower order bits 
required is determined by the frequency of the oscillator—7.8125 MHz) 


¢ The least significant implemented bit of the RTCL (bit 24) is incremented every 128 
ns. 


¢ The period of the RTCL is one billion nanoseconds (one second). 


¢ Unless it is altered by software, the RTCL reaches its terminal count value of 
999,999,872 (one billion minus 128) after 999,999,999 nS. The next time RTCL is 
incremented, it cycles to all zeros and RTCU is incremented. 


¢ Using the mfspr instruction with RTCL does not affect its contents. Unimplemented 
bits are read as zeros. 


¢ Ifthe mtspr instruction is used to replace the contents of the RTCL with the contents 
of a GPR, the values of the GPR corresponding to the unimplemented bits in the 
RTCL are ignored. 


2.2.5.3.2 Real-Time Clock Upper (RTCU) Hegisier 

The RTCU register is a 32-bit binary counter in which the least-significant bit is 
incremented in synchronization with the transition to zero of the RTCL counter (after one- 
billion nanoseconds—that is every second). All 32 bits of the RTCU are implemented. 
When the RTCU is set to all ones, the next time it is incremented it becomes all zeros. 
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When the contents of the RTCU or the RTCL are copied to a GPR, bits in the GPR 
corresponding to the unimplemented bits in the RTCL are cleared. 


2.2.5.3.3 Reading the RTC 


The contents of either RTC register can be copied into a GPR by user programs with the 
mfspr instruction. Because the RTCL continues to increment and the RTCU may be 
incremented while instructions are being executed that read the two RTC registers, when 
the current time is required in a form that includes more than the upper or lower word of 
the RTC, the following procedure should be used: 


1. Execute the following instruction sequence: 


mfspr rA,r4 
mfspr rB,r5 
mfspr rC,r4 


2. If (rC) = (rA) 


then the correct value has been obtained 
else repeat step 1 


Step 2 is required because the RTC continues to increment and the RTCU may increment 
while the instructions that read the two halves of the RTC are being executed. If the values 
in rC and rA match, the RTCU has not been incremented, and the RTCU value can be used 
along with the value in rB as the current RTC value. However, if the values of rC and rA 
differ, the RTCU has been incremented and it cannot be guaranteed which, if either, RTCU 
value should be associated with the value in rB. 


Successive readings of the RTC registers do not necessarily give unique values. If unique 
values are required, and the RTCL being updated at least once every ten add immediate 
instruction times is insufficient to ensure unique values, a software solution is required. 


2.2.5.3.4 RTC Synchronization in a Multiprocessor System 
Typically, RTCs must be synchronized in a multiprocessor system. 


One way to achieve synchronization is to use a gated RTC clock as the input to all 
MPC601s in a system. The gate clock can be enabled and disabled through the use of an 
I/O access (either I/O controller interface store instruction to a selected BUID, or a 
memory-mapped I/O access). This allows the RTC input clock to all processors to be turned 
on and off at the same time. Each processor's RTC register can then be loaded to the same 
value before starting the RTC input clock. 


2.2.5.4 Link Register (LR) 

The 32-bit LR supplies the branch target address for the Branch Conditional to Link 
Register (bclrx) instruction, and can be used to hold the logical address of the instruction 
that follows a branch and link instruction. The format of LR is shown in Figure 2-9. 
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Branch Address 


‘ 


0 31 
Figure 2-9. Link Register (LR) 


Note that although the two least-significant bits can accept any values written to them, they 
are ignored when the LR is used as an address. The link register can be accessed by the 
mtspr and mfspr instructions using the SPR number 8 (the instruction encoding juxtaposes 
the 10-bit binary representation, b’01000 00000’). Prefetching instructions along the target 
path (loaded by an mtspr instruction) is possible provided the link register is loaded 
sufficiently ahead of the branch instruction. It is usually possible to prefetch along a target 
path loaded by a branch and link instruction. 


Branching can be conditional or unconditional, and the return address can optionally be 
provided. If the return address is to be provided, the effective address of the instruction 
following the branch instruction is placed into the LR after the branch target address has 
been computed— this is done regardless of whether the branch is taken. 


As a performance optimization, and as an aid for handling the precise exception model, the 
MPC601 implements a two-entry link register shadow. Shadowing allows the link register 
to be updated by branch instructions that are executed out-of-order with respect to integer 
instructions without destroying machine state information if any integer instructions takes 
a precise exception. The link register is cleared by hard reset. 


2.2.5.5 Count Register (CTR) 

The count register (CTR) is a 32-bit register for holding a loop count that can be 
decremented during execution of branch instructions that contain an appropriately coded 
BO field. If the value in CTR is 0 before being decremented, it is -1 afterward. The count 
register provides the branch target address for the Branch Conditional to Count Register 
(bectrx) instruction. The CTR is shown in Figure 2-10. 


CTR 


Figure 2-10. Count Register (CTR) 


Prefetching instructions along the target path is also possible provided the count register is 
loaded sufficiently ahead of the branch instruction. 


The count register can be accessed by the mtspr and mispr instructions by specifying the 
SPR9. In branch conditional instructions, the BO field specifies the conditions under which 
the branch is taken. The first four bits of the BO field specify how the branch is affected by 
or affects the condition register and the count register. The encoding for the BO field is 
shown in Table 3-25. The counter register is cleared by hard reset. 
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2.3 Supervisor-Level Registers 


There are registers in the MPC601 that can be accessed only by supervisor-level software. 
These include the machine state register (MSR), the segment registers, and a number of 
SPRs. 


2.3.1 Machine State Register (MSR) 


The MSR, shown in Figure 2-11, is a 32-bit register that defines the state of the processor. 
When an exception occurs, MSR bits are altered in accordance with Table 2-10. The MSR 
can also be modified by the mtmsr, sc, and rfi instructions. It can be read by the mfmsr 
instruction. Note that in other PowerPC processors, the MSR is a 64-bit register. 


Note that the MPC601 does not implement the branch trace enable bit—BE (bit 22) or the 
recoverable exception bit —RE (bit 30). The state of these bits does not affect the operation 
of the MPC601. 


Reserved 





0 15 16171819 20 2122 23 24 25 26 27 28 2930 31 


Figure 2-11. Machine State Register (MSR) 
Table 2-10 shows the bit definitions for the MSR. 


Table 2-10. Machine State Registev Bit Settings 














External interrupt enable 
0 ‘The processor delays recognition of external interrupts and decrementer exception 
conditions. 


‘1 ‘The processor is enabled to take an external interrupt or the decrementer exception. 















Privilege level 
0 ‘The processor can execute both user- and supervisor-level instructions. 
1 The processor can only execute user-level instructions. 









Floating-point available 

0 ‘The processor prevents dispatch of floating-point instructions, including floating-point 
loads, stores and moves. Floating-point enabled program exceptions can still occur and 
the FPRs can still be accessed. 

1 The processor can execute floating-point instructions, and can take floating-point 

enabled exception type program exceptions. 








Machine check enable 
0 Machine check exceptions are disabled. 
1. Machine check exceptions are enabled. 


}20 | FEO | Floating-point exception mode 0 (See Table 2-11.) 
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Table 2-10. Machine State Register Bit Settings (Continued) 











Single-step trace enable 
0 The processor executes instructions normally. 
1. The processor generates a single-step trace exception upon the successful execution of 
the next instruction. When this bit is set, the processor dispatches instructions in strict 
program order. Successful execution means the instruction caused no other exception. 
Single-step tracing may not be present on all implementations. 


Reserved * on the MPC601. 
Floating-point exception mode 1 (See Table 2-11.) 
Reserved. This bit corresponds to the AL bit of the POWER architecture. 


Exception prefix. The setting of this bit specifies whether an exception vector offset is 
prepended with Fs or Os. In the following description, nnnnnis the offset of the exception. See 
Table 5-7. 

0 Exceptions are vectored to the physical address x'000n_nnnn'’. 

1 Exceptions are vectored to the physical address x’FFFn_nnnn’. 



















Instruction address translation 
0 Instruction address translation is disabled. When instruction translation is off, EAis 

interpreted as described in Chapter 6, “Memory Management Unit." 
Instruction address translation is enabled. 










1 







Data address translation 

0 ‘Data address translation is disabled. When data translation is disabled, EA is interpreted 
as described in Chapter 6, “Memory Management Unit.” 

1 Data address translation is enabled. 


30} — Reserved’ on the MPC601. 
rai [— __| Reserved on the MPCsOr. 


“These reserved bits may be used by other PowerPC processors. Attempting to change these bits does not 
affect the operation of the MPC601. These bit positions always return a zero value when read. 














The floating-point exception mode bits are interpreted as shown in Table 2-11. For further 
details see Section 5.4.7.1, “Floating-Point Enabled Program Exceptions.” Note that these 
bits are logically ORed, so that if either is set the processor operates in precise mode. 


Table 2-11. Floating-Point Exception Mode Bits 





Floating-point precise mode 


“Because FEO and FE1 are logically ORed on the 
MPC6E01, neither of these modes is available. If 
either bit is set, the processor operates in precise 
mode. 
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Table 2-12 indicates the state of the MSR after a hard reset: 
Table 2-12. State of MSR at Power Up 


28-30 0 (Reserved) 


2.3.2 Segment Registers 

The sixteen 32-bit segment registers are present only in 32-bit PowerPC implementations. 
Figure 2-12 shows the format of a segment register in the MPC601. Note that the fields in 
the segment register are interpreted differently depending on the value of bit 0 (the T bit). 





Reserved 


Reserved/BUID VSID/controller specific information 


012 3 78 31 
Figure 2-12. Segment Register Format (T = 0) 


Segment registers can be accessed by using the mtsr and mtsrin instructions. Segment 
register bit settings when T = () are described in Table 2-13. 


Table 2-13. Segment Register Bit Settings (T = 0) 


Bee sal T = 0 selects this format 
fks | Supervisor-state memory key 












User-state protection key : 
VSID Virtual segment ID 


Figure 2-13 shows the bit definition when T = 1. 
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BUID Controller Specific 


G4 "25-8 11 12 31 
Figure 2-13. Segment Register Format (T=1) 
The bits in the segment register when T = 1 are described in Table 2-14. 


Table 2-14. Segment Register Bit Settings (T = 1) 


Ce 
Ee Co 
ee 
ae ee 


If T=0 in the selected segment register, the effective address is a reference to an ordinary 
memory segment. For memory segments the segmented address translation mechanism 
may be superseded by the block address translation (BAT) mechanism. If not, the 52-bit 
virtual address (VA) is formed by concatenating the following: 


¢ The 24-bit VSID field from the segment register 
¢ The 16-bit page index, EA[4—19] 
¢ The 12-bit byte offset, EA[20-31] 











The VA is then translated to a physical address as described in Section 6.8, “Memory 
Segment Model.” 


If T=1 in the selected segment register, the effective address is a reference to an I/O 
controller interface segment. No reference is made to the page tables; address translation 
continues as described in Section 6.10, “I/O Controller Interface Address Translation.” 


The segment registers are cleared by hard reset. 


2.3.3 Supervisor-Level SPRs 


Many of the SPRs can be accessed only by supervisor-level instructions. Some SPRs are 
implementation-specitic; some SPRs in the MPC601 may not be implemented in other 
PowerPC processors, or may not be implemented in the same way. Table 2-15 summarizes 
how the MPC6()1 treats the undefined bits in supervisor-level SPRs. 


Some SPR bits are reserved, and should not be used. Some of these bits are used in other 
PowerPC processors. 
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Table 2-15. Undefined Bits in Supervisor-Level SPRs 


< Value Returned for Undefined 
Register Bits 


MSR Zero 


When the MPC6(0)1 detects SPR encodings other than those defined in this document, it 
either takes a program exception (if bit O of the SPR encoding is set) or it treats the 
instruction as a no-op (if bit 0 of the SPR encoding is clear). 















2.3.3.1 Synchronization for Supervisor-Level SPRs, and Segment 
Registers 
The processor has synchronization requirements when updating the following MMU 


registers when the corresponding address translation is enabled (data accesses with 
MSR[DT]=1 or instruction fetches with MSR[IR]=1): 


* SDRI1 
¢ BATs (if MSR[DT]=1 or MSR[IT]=1) 
¢ Segment registers 


In addition, there are other software requirements that should be observed when modifying 
these MMU registers and the MSR[IT] bit. 


2.3.3.1.1 Context Synchronization 


The processor checks for read and write dependencies with respect to segment registers and 
special purpose registers and executes series of instructions involving those registers so that 
dependencies are not violated. For example, if an mtspr instruction writes a value to a 
particular SPR and an mfspr instruction later in the instruction stream reads the same SPR, 
the mfspr reads the value written by the mtspr. 
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It is important to note that dependencies caused by side effects of writing to segment 
registers and SPRs are not checked automatically. If an mtspr instruction writes a value to 
an SPR that changes how address translation is performed, a subsequent load instruction 
cannot use the new translation until the CPU is explicitly synchronized by using one of the 
following context-synchronizing operations: 


¢ isync instruction 

* sc instruction 

¢ rfi instruction 

¢ Any exception, other than machine check and system reset 


Note that the sync instruction, although not defined as context-synchronizing in the 
PowerPC architecture, can sometimes be used to provide the required synchronization. The 
MPC601 processor automatically provides all synchronization required for updates to the 
CR, CTR, LR, MSR, FPSCR, and XER registers in all cases. 


In general, context-synchronizing operations are required when writes to the MMU 
registers are preceded or followed by load or store instructions. 


Specifically, a context-synchronizing operation or a sync instruction must precede a 
modification of the BAT or segment registers when the corresponding address translations 
are enabled (data accesses with MSR[DT]=1 or instruction fetches with MSR[IR]=1). In 
the case of the SDR1, a sync instruction must precede the modification of the SDR1 when 
the corresponding address translations are enabled (data accesses with MSR[DT]=1 or 
instruction fetches with MSR[IR]=1), guaranteeing that the reference and change bits are 
updated in the correct context. 


If the corresponding address translations are enabled (data accesses with MSR[DT]=1 or 
instruction fetches with MSR[IR]=1), a context synchronization operation must follow the 
modification of any of the above registers. 


When several of the registers listed above are modified with no intervening instructions that 
are affected by the changes, no context synchronization or sync instructions are required 
between the alterations. However, instructions fetched and/or executed after the alteration 
but before the context synchronizing operation may be fetched and/or executed in either the 
context that existed before the alteration or the context established by the alteration. 


For synchronization within a sequence of instructions, the isync instruction can be used as 
shown in example 1: 


Example 1: Using the isync instruction—In this example a single segment register (n) 
needs to be updated in a context where loads and stores might otherwise execute ahead of 
the mtsr instruction and use the outdated address translation. Data and instruction address 
translation is enabled (MSR[DT] = 1 and MSR[IT] = 1): 


isyne 
mtsr st,rn 
isync 
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The first isyne instruction allows all instructions in the pipeline to complete, allowing the 
mtsr instruction to dispatch and execute by itself. 


Example 2: Using the isync instruction with a series of register modifications—tIn 
example 1, the single mtsr instruction could safely be replaced with a series of mtsr 
instructions without each requiring a isync instruction. However, if both mtsr and mfsr 
instructions are needed, they should be separated by an isync instruction, as follows: 


isyne 
mtsr sr,rQ 
mtsr srr] 


mtsr sr,r7 
isync 

mfsr r&,sr 
minfsr r9,sr 


mfsr r15,sr 
isync 


Example 3: Using the rfi instruction—When several registers are updated with no 
intervening loads or stores with MSR[DT]=1 or instruction fetches with MSR[IT]=1, 
context-synchronization between updates is unnecessary. For example, when an exception 
is taken, the processor is synchronized automatically. In this example, a list of segment 
registers is updated with several mtsr instructions followed by a single context- 
synchronizing operation. 


Because this example modifies all 16 segment registers (and therefore, affects the segment 
register(s) that control instruction fetching, this particular sequence must be executed in 
direct address translation mode (MSR[IT] = 0). Therefore, no synchronization is required 
before the segment registers are loaded. Even if the segment register(s) that control 
instruction fetching is not to be reloaded, the sequence can be executed with instruction 
address translation enabled (MSR[IT] = 1) and no additional synchronization before the 
segment register instructions. 


In this example the rfi instruction provides the needed synchronization after all 16 segment 
registers are loaded and before translated loads and stores are executed. 


mtsr sr,r0 
mtsr srr] 


mtsr sr,r15 
<load rest of machine state> 
rfi 
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2.3.3.1.2 Other Requirements by Register 

SDR1 and MSR—The SDRI1 register should be modified only when MSR[IT] = 0. In 
addition, the MSR[IT] bit should be altered only by software that is has an address mapping 
such that logical addresses directly map to physical addresses. 


Segment Registers—The only fields that should be modified in the segment register that 
is currently in use for instruction fetching are the Ks and Kp bits. Note that any time the 
segment registers are updated, the changes are guaranteed to take affect (including changes 
of the Kx bits) only after a context-synchronizing operation has occurred. 


BAT Registers—The only fields that should be modified in the BAT register that is 
currently in use for instruction fetching are the Ks, Kp and the V (valid) bits. In the case of 
modifying the V bit for the BAT register currently in use for instruction accesses, the 
instructions immediately following the mtspr for the BAT register must also be mapped by 
the page address translation mechanism with the same logical to physical address mapping 
(or alternately, the instructions must be duplicated in the newly mapped space). Note that 
any time the BAT registers are updated, the changes are guaranteed to take affect (including 
changes of the Kx bits) only after a context-synchronizing operation has occurred. 


In order to make a BAT register pair valid in a manner such that the BTLB entry then 
translates the current instruction stream, the following sequence should be used if fields in 
both the upper and lower BAT registers are to be modified (for instruction address 
translation): 

1. The V bit in the BAT register pair should be cleared to 0. 

2. The other fields in the BAT register pair should be initialized appropriately. 
3. The V bit in the BAT register pair should be set to 1. 
4, 


A context-synchronizing operation should be performed 


2.3.3.2 DAE/Source Instruction Service Register (DSISR) 


The 32-bit DSISR, shown in Figure 2-14, identifies the cause of data access and alignment 
exceptions. 


0 31 
Figure 2-14. DAE/Source Instruction Service Register (DSISR) 


For information about bit settings, see Section 5.4.3, “Data Access Exception (x’00300”),” 
and Section 5.4.6, “Alignment Exception (x’00600’).” 


The DSISR is cleared after a hard reset. 


2.3.3.3 Data Address Register (DAR) 
The DAR is a 32-bit register shown in Figure 2-15. 
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Figure 2-15. Data Address Register ( DAR) 


After a data access, I/O controller interface error, or alignment exception, DAR is set to the 
effective address of a load or store element. For information, see Section 5.4.3, “Data 
Access Exception (x’00300’),” and Section 5.4.6, “Alignment Exception (x’00600’).” 


2.3.3.4 Decrementer (DEC) Register 

The DEC, shown in Figure 2-16, is a 32-bit decrementing counter that provides a 
mechanism for causing a decrementer exception after a programmable delay. On the 
MPC601, the DEC is driven by the same frequency as the RTC (7.8125 MHz). On other 
PowerPC processors, the DEC frequency is based on a subdivision of the processor clock. 
The DEC is cleared by hard reset. Note that if an external clock is connected to the RTC, 
the DEC can change from its original value of zeros without receiving an instruction to load 
the register. 


DEC 


Figure 2-16. Decrementer Register (DEC) 


2.3.3.4.1 Decrementer Operation 
The DEC counts down, causing an exception (unless masked) when it passes through zero. 
The DEC satisfies the following requirements: 


¢ The operation of the RTC and the DEC are coherent (that is, the counters are driven 
by the same fundamental time base). 


¢ Loading a GPR from the DEC has no effect on the DEC. 
¢ Storing a GPR to the DEC replaces the value in the DEC with the value in the GPR. 


¢ Whenever bit () of the DEC changes from 0 to 1, a decrementer exception request is 
signaled. (The exception breaks the pipeline in such a way that instructions in the 
execute state (except for instructions that have been dispatched ahead of 
undispatched integer instructions) complete execution, and instructions in decode 
stage remain undecoded until the exception handler returns control to the interrupted 
program.) Multiple DEC exception requests may be received before the first 
exception occurs; however, any additional requests are canceled when the exception 
occurs for the first request. 


¢ Ifthe DEC is altered by software and the content of bit 0 is changed from 0 to 1, an 
exception request is signaled. 
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The RTC input is sampled using the CPU clock. Therefore, if the CPU clock is less than 
twice the RTC frequency, real-time clock (and decrementer) sampling and incrementing 
errors will occur. Therefore, in systems that change the CPU clock frequency dynamically 
beyond this limit, a method of saving and restoring the real-time clock register values via 
external means is required. 


2.3.3.4.2 Writing and Reading the DEC 


The content of the DEC can be read or written using the mfspr and mtspr instructions, both 
of which are supervisor-level when they refer to the DEC. However, the MPC601 also 
allows the reading of the DEC in user mode (for POWER compatibility) via the SPR6 
register. Using a simplified mnemonic for the mtspr instruction, the DEC may be written 
from GPR rA with the following: 


mtdec rA 
If the execution of this instruction causes bit 0 of the DEC to change from 0 to 1, an 


exception request is signaled. The DEC may be read into GPR rA with the following 
sequence: 


mfdec rA 


x 


Copying the DEC to a GPR does not affect the DEC content or the exception mechanism. 


2.3.3.5 Table Search Descriptor Register 1 (SDR1) 
The table search descriptor register 1 (SDR1) is shown in Figure 2-17. 


Reserved 





HTABORG HTABMASK 


0 15 16 22 23 31 


Figure 2-17. Table Search Descriptor Register 1 (SDR1) 


The bits of the SDR1 are described in Table 2-16. 
Table 2-16. Table Search Descriptor Register 1 (SDR1) Bit Settings 







Description 


HTABORG The high-order 16 bits of the 32-bit physical address of the page table 
[ae eae ern 
HTABMASK | Mask for page table address 


The HTABORG field in SDR1 contains the high-order 16 bits of the 32-bit physical address 
of the page table. Therefore, the page table is constrained to lie on a 2'° byte (64 Kbytes) 
boundary at a minimum. At least 10 bits from the hash function are used to index into the 
page table. The page table must consist of at least 64 Kbytes 2!° PTEGs of 64 bytes each. 
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The page table can be any size 2” where 16 <n < 25. As the table size is increased, more 
bits are used from the hash to index into the table and the value in HTABORG must have 
more of its low-order bits equal to 0. The HTABMASK field in SDR1 contains a mask 
value that determines how many bits from the hash are used in the page table index. This 
mask must be of the form b’00...011...1’; that is, a string of 0 bits followed by a string of 1 
bits. The 1 bits determine how many additional bits (at least 10) from the hash are used in 
the index; HTABORG must have this same number of low-order bits equal to 0. See 
Figure 6-21. 


The number of low-order 0 bits in HTABORG must be at least the number of 1 bits in 
HTABMASK so that the final 32-bit physical address can be formed by logically ORing 
the various components. 


2.3.3.6 Machine Status Save/Restore Register 0 (SRRO) 

The machine status save/restore register 0 (SRRO) is a 32-bit register the MPC6()1 uses to 
save machine status on exceptions and restore machine status when an rfi instruction is 
executed. It also holds the EA for the instruction that follows the System Call (sc) 
instruction. The SRRO is shown in Figure 2-18. 


0 3t 
Figure 2-18. Save/Restore Register 0 (SRRO) 


When an exception occurs, SRRO is set to point to an instruction such that all prior 
instructions have completed execution and no subsequent instruction has begun execution. 
The instruction addressed by SRRO may not have completed execution, depending on the 
exception type. SRRO addresses either the instruction causing the exception or the 
immediately following instruction. The instruction addressed can be determined from the 
exception type and status bits. 


The SRRO is cleared by hard reset. 


For information on how specific exceptions affect SRRO, refer to the descriptions of 
individual exceptions in Chapter 5, “Exceptions.” 


2.3.3.7 Machine Status Save/Restore Register 1 (SRR1) 


The SRR1 is a 32-bit register used to save machine status on exceptions and to restore 
machine status when an rfi instruction is executed. The SRR1 is shown in Figure 2-19. 


0 15 16 31 


Figure 2-19. Machine Status Save/Restore Register 1 (SRR1) 
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In general, when an exception occurs, bits 0-15 of SRR1 are loaded with exception-specific 
information and bits 16-31 of MSR are placed into bits 16-31 of SRR1. 


The SRR1 is cleared by hard reset. 


For information on how specific exceptions affect SRR1, refer to the individual exceptions 
in Chapter 5, “Exceptions.” 


2.3.3.8 General SPRs (SPRGO—-SPRG3) 


SPRGO through SPRG3 are 32-bit registers provided for general operating system use, such 
as performing a fast state save and for supporting multiprocessor implementations. 
SPRGO-SPRG3 are shown in Figure 2-20. 


0 31 












Figure 2-20. General SPRs (SPRGO-SPRG3) 
Uses for SPRGO-SPRG3 are shown in Table 2-17. 
Table 2-17. Uses of SPRGO-SPRG3 


SPRGO | Software may load a unique physical address in this register to identify an area of memory reserved for 
use by the exception handler. This area must be unique for each processor in the system. 


SPRG1_ | This register may be used as a scratch register by the exception handler to save the content of a GPR. 
That GPR then can be loaded from SPRGO and used as a base register to save other GPRs to memory. 

SPRG2 ]j This register may be used by the operating system as needed. 

SPRG3 | This register may be used by the operating system as needed. 


2.3.3.9 External Access Register (EAR) 


The EAR is a 32-bit SPR that controls access to the external control facility and identifies 
the target device for external control operations. The external control facility provides a 


means for user-level instructions to communicate with special external devices. The EAR 
is shown in Figure 2-21. 













MOTOROLA Chapter 2. Registers and Data Types 2-31 


Reserved 





01 27 28 31 


Figure 2-21. External Access Register (EAR) 


This register is provided to support the External Control Input Word Indexed (eciwx) and 
External Control Output Word Indexed (ecowx) instructions, which are described in 
Chapter 10, “Instruction Set.” Although access to the EAR is privileged, the operating 
system can determine which tasks are allowed to issue external access instructions and 
when they are allowed to do so. The bit settings for the EAR are described in Table 2-18. 
Interpretation of the physical address transmitted by the eciwx and ecowx instructions and 
the 32-bit value transmitted by the ecowx instruction is not prescribed by the PowerPC 
architecture but is determined by the target device. 


For example, if the external control facility is used to support a graphics adapter, the ecowx 
instruction could be used to send the translated physical address of a buffer containing 
graphics data to the graphics device. The ecowx instruction could be used to load status 
information from the graphics adapter. 


Table 2-18. External Access Register (EAR) Bit Settings 


ee 
specified external operation. If the bitis cleared, an eciwx or ecowx 


oh instruction causes a data access exception. 


28-31 Resource ID. The RID is formed by concatenating TBST||TSIZO— 
TS!Z2. Note that in other PowerPC implementations, this field may 
use bits 26-31. 


This register can also be accessed by using the mtspr and mfspr instructions using the 


value 282, b’01000 11010’. When reading from the EAR, the following sequence should 
be used: 










Enable bit 
1 Enabled 
0 = Disabled 


If this bit is set, the eciwx and ecowx instructions can perform the 





sync 
mfspr rD,282 
sync 


The EAR is cleared by hard reset. 
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2.3.3.10 Processor Version Register (PVR) 


The PVR is a 32-bit, read-only register that identifies the version and revision level of the 
PowerPC processor (see Figure 2-22). The contents of the PVR can be copied to a GPR by 
the mfspr instruction. Read access to the PVR is available in supervisor mode only; write 
access is not provided. 


Version Revision 


0 15 16 31 
Figure 2-22. Processor Version Register (PVR) 


The PVR consists of two 16-bit fields: 


* Version (bits 0-15)—A 16-bit number that identifies the version of the processor 
and of the PowerPC architecture. 


— The processor version number is x'0001' for the MPC601. 


— Other processor numbers assigned as of the initial release of the MPC601 are as 
follows 


x’0003’ 
x’0004’ 
x’0014’ 
¢ Revision (bits 16-31)—A 16-bit number that distinguishes between various releases 
of a particular version, (that is, an engineering change level). The Value of the 
revision portion of the PVR is implementation-specific. 


— The initial processor revision level is x'0000' and will be changed for each 
revision of the device. 


The PVR is set to x’00010001’ by hard reset. 


2.3.3.11 BAT Registers 

The MPC601 includes eight block-address translation (BAT) registers, consisting of four 
pairs of BATs (BATOU-BAT3U and BATOL-—BAT3L), as shown in Figure 2-1. Note that 
this differs somewhat from other PowerPC implementations, which have two sets of four 
pairs of BAT registers. One set contains instruction BATS, or IBATs, IBATOQU-IBAT3U 
and IBATOL-IBAT3L), which maps to the BAT registers implemented in the MPC601. The 
SPR numbers for these registers are listed in Figure 2-1 The additional eight registers are 
data BATs, or DBATs, (DBATOU-DBAT3U and DBATOL-DBAT3L). These BATs use the 
eight SPR numbers subsequent to those used by the IBATs (536-543). 


Note that the implementation of the bit fields with in the BATs are different from the other 
PowerPC implementations. Figure 2-23 and Figure 2-24 show the format of the upper and 
lower BAT registers. 
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0 14 15 2425 27 28 2930 31 





Reserved 
Figure 2-23. Upper BAT Register 


Reserved 





0 14 15 24 25 26 31 
Figure 2-24. Lower BAT Register 
Table 2-19 describes the bits in the BAT registers. 
Table 2-19. BAT Registers 


Register 


Block logical page index. This field is compared with bits 0-14 of the logical 
BAT address to determine if there is a hit in that BTLB entry. 
Registers 


Memory/cache access mode bits 

W_ Write-through 

1 Caching-inhibited 

M Memory coherence 

For detailed information about the WIM bits, see Section 6.3, “Memory/Cache 
Access Modes." 


Supervisor mode key. This bit interacts with MSR[PR] and the PP field to 
determine the protection for the block. For more information, see Section 6.4, 
“General Memory Protection Mechanism." 


User mode key. This bit also interacts with MSR[PR] and the PP field to 
determine the protection for the block. For more information, see Section 6.4, 
“General Memory Protection Mechanism." 


Protection bits for block. This field interacts with MSR[PR] and the Ks or Ku to 
determine the protection for the block as described in Section 6.4, “General 
Memory Protection Mechanism." 


Physical block number. This field is used in conjunction with the BSM field to 
BAT generate bits 0-14 of the physical address of the block. 
Registers 


Block size mask (0...5). BSM is a mask that encodes the size of the block. 
Values for this field are listed in Table 2-20. 





Table 2-20 lists the BAT area lengths encoded in by BAT[BSM]. 
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Table 2-20. BAT Area Lengths 


BAT Area F 
Length BSM Encoding 
128 Kbytes 00 0000 
256 Kbytes 00 0001 


512 Kbytes 00 0011 


Only the values shown in Table 2-20 are valid for the BSM field. The rightmost bit of BSM 
is aligned with bit 14 of the logical address. An logical address is determined to be within 
a BAT area if the logical address matches the value in the BLPI field. 





The boundary between the string of zeros and the string of ones in BSM determines the bits 
of logical address that participate in the comparison with BLPI. Bits in the logical address 
corresponding to ones in BSM are cleared for this comparison. 


Bits in the logical address corresponding to ones in the BSM field, concatenated with the 
17 bits of the logical address to the right (more significant bits) of BSM, form the offset 
within the BAT area. 


The value loaded into BSM determines both the length of the BAT area and the alignment 
of the area in both logical and physical address space. The values loaded into BLPI and 
PBN must have at least as many low-order zeros as there are ones in BSM. 


The BAT registers are cleared by hard reset. 


2.3.3.12 MPC601 Implementation-Specific HID Registers 

PowerPC processors may have implementation-specific SPRs, referred to as HID registers. 
Additional SPR encodings allow access to the implementation-dependent registers within 
the MPC601. The SPR encodings for the MPC601’s HID registers are described in 
Table 2-21. Note that these encodings use split-field notation; that is, the order of two 5-bit 
components of the 10-bit encoding is reversed. 
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Table 2-21. Additional SPR Encodings 


see[ meee 


For additional information about the mtspr and mfspr instructions, refer to Chapter 10), 
“Instruction Set.” 


2.3.3.12.1 Checkstop Sources and Enables Register—HIDO 


The checkstop sources and enables register (HIDO), shown in Figure 2-25, is a supervisor- 
level register that defines enable and monitor bits for each of the checkstop sources in the 
MPC601. The SPR number for HIDO is 1008. 












HIDO 
EDT EBA EBD 
ESH ECP 
ECD EIU 
ETD EPP 






0 2 3 4 5 6 7 8 9 10 1112 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 
DRF ———! | 
DRL | 
PAR 

Reserved EMC EHP 





Figure 2-25. Checkstop Sources and Enables Register (HIDO) 


Table 2-22 defines the bits in HIDO. The enable bits (bits 15-31) can be used to mask 
individual checkstop sources, although these are provided primarily to mask off any false 
reports of such conditions for debugging purposes. Bit 0 (HIDO[CE]) is a master checkstop 
enable; if it is cleared, all checkstop conditions are disabled; if it is set, individual 
conditions can be enabled separately. HIDO[EM] (bit 16) enables and disables machine 
check checkstops; clearing this bit masks machine check checkstop conditions that occur 
when MSR[ME] is cleared. Bits 1-11 are the checkstop source bits, and can be used to 
determine the specific cause of a checkstop condition. 


All enable bits except 15 and 24 are disabled at start up. The operating system should enable 
these checkstop conditions before the power-on reset sequence is complete. 
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Table 2-22. Checkstop Sources and Enables Register (HIDO) Definition 






Description 


Cc Master checkstop enable. Enabled if set. 


m 


Microcode checkstop detected if set. 


Double machine check detected if set. 


4 
Oo 


Multiple TLB hit checkstop if set. 


UV [nlaricesot isnt SSCS 





cD Multiple cache hit checkstop if set. 
SH 


T 


Sequencer time out checkstop if set. 


Oo 


Dispatch time out checkstop if set. 
Bus address parity error if set. 


Bus data parity error if set. 


QO 
me) 


Cache parity error if set. 


Oo 


Invalid microcode instruction if set. 
V/O controller interface access protocol error if set. 
12-14 Reserved 


ES Enable microcode checkstop. Enabled by hard reset. Enabled if set. 


= 
ao 


Enable machine check checkstop. Disabled by hard reset. Enabled if set. 


= 
N 


D Enable TLB checkstop. Disabled by hard reset. Enabled if set. 





= 
eo 


ECD Enable cache checkstop. Disabled by hard reset. Enabled if set. 


19 S) Enable sequencer time out checkstop. Disabled by hard reset. Enabled if set. 





20 


Enable dispatch time out checkstop. Disabled by hard reset. Enabled if set. 


BA Enable bus address parity checkstop. Disabled by hard reset. Enabled if set. 


iw) 
De) 


Enable bus data parity checkstop. Disabled by hard reset. Enabled if set. 


nr 

= 

mi) myjm 
doi =z 


to 
@ 


= 


nh 
(52) 


ECP Enable cache parity checkstop. Disabled by hard reset. Enabled if set. 


24 EIU Enable for invalid ucode instruction checkstop. Enabled by hard reset. Enabled if set. 


Enable for I/O controller interface access protocol checkstop. Disabled by hard reset. ; 
Enabled if set. 


26 DRF 0 Optional reload of alternate sector on instruction fetch miss is enabled. 


Optional reload of alternate sector on instruction fetch miss is disabled. 


DRL 0 Optional reload of alternate sector on load/store miss is enabled. 








1 Optional reload of alternate sector on load/store miss is enabled. 


Q —_— Big-endian mode is enabled. 

1 _Little-endian mode is enabled. 

For more information about byte ordering, see Section 2.4.3, “Byte and Bit Ordering.” Note 
that in the PowerPC architecture, the selection between big- and little-endian mode is 
controlled by two bits in the MSR. 


29 PAR 0 Precharge of the ARTRY and SHD signals is enabled. 
1 Precharge of the ARTRY and SHD signals is disabled. 
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Table 2-22. Checkstop Sources and Enables Register (HIDO) Definition (Continued) 


1 a a 


No error detected in main cache during array initialization. 
Error detected in main cache during array initialization. 


The HP_SNP_REQ signal is disabled. Use of the WRS queue position is restricted to a 
snoop hit that occurs when a read is pending. That is, its address tenure is complete but 
the data tenure has not begun. 

The HP_SNP_REQ signal is enabled. Use of the WRS queue position is restricted to a 
snoop hit on an address tenure that had HP_SNP_REQ asserted. 





The HIDO register is set to x’80010080’ by the hard reset operation. 


2.3.3.12.2 MPC601 Debug Modes Register—HID1 


The MPC6()1 debug modes register (HID1) is a supervisor-level register that defines enable 
bits for the various debug modes supported by the MPC60)1; see Figure 2-26. The SPR 
number for HID1 is 1009. 


HID1 


Reserved 





9 10 16 17 31 
Figure 2-26. MPC601 Debug Modes Register 


Table 2-23 shows bit settings for the HID1 register. Note that if both the single instruction 
step option is specified for the M field (b’100’) and the trap to run mode exception option 
is specified in the RM field (b’10’), the processor iterates in an infinite loop. 


Table 2-23. HID1 Register Definition 


a 
LC 
110 Full instruction address compare 


1-3 MPC601 run modes 
111 Full branch target address compare 


000 Normal run mode 
a 








001 Undefined. Do not use. 

010 Limited instruction address compare. 
011 Undefined. Do not use. 

100 Single instruction step 

101 Undefined. Do not use. 
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Table 2-23. HID1 Register Definition (Continued) 


cm Poe Pte 


Response to address compare or single step 
Hard stop (Stop L1 clocks.) 
Soft stop (Wait for system activity to quiesce.) 
Trap to run mode exception (address vector x'02000'), with the base address 
indicated in by the setting of MSR[IP]}. This mode is valid for address comparisons 
and may produce unpredictable results when used with HID single-instruction step 


mode. 
Reserved. Do not use. 


10-16 eee 7 10-16 Reserved. Do not use. 


47-31 MISC Miscellaneous latches 
17 When high, this bit disables the broadcast of the tlbie instruction. 
18-31 Reserved. Do not use. 


The HID1 register is cleared by a hard reset. 


2.3.3.12.3 Instruction Address Breakpoint Register (IABR)—HID2 

The instruction address breakpoint register (IABR), is also HID2. The [ABR, shown in 
Figure 2-27, is a supervisor-level register defined to hold an effective address that is used 
to compare with the logical address of the instruction in the decode phase of the pipeline. 
The results of the comparison are used differently depending on the debug mode used. 





ee Reserved 





0 29 30 31 


Figure 2-27. Instruction Address Breakpoint Register (IABR)—HID2 


Table 2-24 lists HID2 register definitions. The HID2 register is cleared by the hard reset 
operation. 


The SPR number for HID2 is 1010. 
Table 2-24. HID2 Register Definition 
Ta eee [een 


Comparison effective address 
30-31 I— | Reserved. Should be set to zero. 
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2.3.3.12.4 Data Address Breakpoint Register (DABR)—HID5 

The data address breakpoint register (DABR) (HIDS), as shown in Figure 2-28, is designed 
to hold an effective address that is used to compare with the effective address of the various 
memory access instructions. The results of the comparison are used to cause a data access 
exception when the appropriate MPC601 debug mode bits are set (as described in 
Section 2.3.3.12.2, “MPC601 Debug Modes Register—HID1”). 


HID5 


Reserved 





0 28 29 30 31 


Figure 2-28. Data Address Breakpoint Register (DABR) 


Table 2-25 describes bit settings in HIDS. The HIDS register is cleared by the hard reset 
operation. 


Table 2-25. HiD5 Register Definition 


Description 


Data address breakpoint (EA). This field is set to the double-word EA to compare with 
enabled load or store EAs. 


Reserved, although on an mfspr (DABR), the value returned is the value last written. 


Memory access types: 

00 Breakpoints disabled 

01 Breakpoints load accesses only 

10 Breakpoints store accesses only 

11. Breakpoints both foad and store accesses 





The SPR number for HIDS is 1013. 


If the DABR feature is enabled, operations that hit against a properly enabled DABR cause 
a data access exception. For this type of data access exception (DAE), bit 9 of the DSISR 
is set and the data address register (DAR) contains the EA that caused the DABR match. If 
the access crossed a double-word boundary, the DAR contains the EA of the access from 
the first double word (even if the DABR match was on the second double word). For more 
information about data access exceptions, see Section 5.4.3, “Data Access Exception 
(x’00300’).” 


Table 2-26 describes how each instruction type interacts with the DABR feature. 
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Table 2-26. DABR Results 


Load If any part of the load access touches the double word specified in the DABR, and the appropriate 
instructions enable bit is set, then the DAE occurs, In this case, the memory read operation is inhibited and 
register rS is not updated. If the operation is a load with update, the update to register rAis also 
inhibited. 


























If any part of the store access touches the double word specified in the DABR and the appropriate 
enable bit is set, the DAE occurs and the memory access is inhibited. 
If the operation is a store with update, then the update to register rAis also inhibited. 


Store 
instructions 






If the operation is a Store Conditional instruction and the reservation bit is not set at the time of the 
DABR compare (at the end of execution as soon as the EA is calculated), the DAE is not taken. 




















Loadandstore | These instructions are sequenced one register (one word) at a time through the IU for EA 





string and calculation. Each access is checked against the DABR as it is presented to the ATU. If a match 
multiple occurs, the instruction is aborted and a DAE is taken. 
instructions If the initial EA for the string or multiple is not word-aligned, some individual accesses may cross a 





double word boundary. If either double word hits in the DABR, the access is inhibited and the DAE 
occurs. 


iscbx This instruction is not supported by the DABR Feature. No DAE occurs, even if the EA matches. 
Instruction 

Cache control | These instructions are not supported by the DABR Feature. No DAE occurs even if the EA 
instructions matches. 


2.3.3.12.5 Processor Identification Register (PIR)—HID15 

The PIR register, shown in Figure 2-29, is a 32-bit, supervisor-level register that holds the 
4-bit processor identification tag (PID). This tag is useful for processor differentiation in 
multiprocessor system designs. The tag is also used to identify the sender and receiver tag 
for I/O controller interface operations. For more information, see Section 9.6, “I/O 
Controller Interface Operation.” The PIR can be accessed by the mfspr instruction by using 
the SPR number 1023, as follows: 















sync 
mfspr rD,1023 
sync 


The PIR is cleared by the hard reset operation. 


PIR 


Reserved 





0 28 29 31 


Figure 2-29. Processor Identification Register (PIR) 
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2.4 Operand Conventions 


This section describes the conventions used for storing values in registers and memory. 


2.4.1 Effect of Operand Placement on Performance 


‘The placement (location and alignment) of operands in memory affect the relative 


performance of memory accesses. The best performance is guaranteed if memory operands 
are aligned. To obtain the best performance across the widest range of PowerPC processor 
implementations, the programmer should assume the performance model described in 
Figure 2-30 with respect to the placement of memory operands. 









Operand Boundary Crossing 


Integer 


8 Byte 8 Optimal 
4 Good Good ater ace 
<4 Poor Poor = — 
4 Byte Optimal 
Good Pen. 
2 Byte Optimal 
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1 Byte | 1 Byte | 
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Poor Poor Poor Poor 
4 Byte optimal 
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Figure 2-30. Performance Effects of Memory Operand Placement 



















The performance of accesses varies depending on the following: 


Operand size 

Operand alignment 

Crossing a cache block (sector) boundary 
Crossing a page boundary 

Crossing a BAT boundary 

Crossing a segment boundary 


The load/store multiple instructions are defined to operate only on aligned operands. The 
Move Assist instructions have no alignment requirements. For the purposes of Figure 2-30, 
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crossing pages with different memory control attributes (WIM bits) is equivalent to 
crossing a segment boundary. 


2.4.1.1 Instruction Restart 

If amemory access crosses a page or segment boundary, a number of conditions could abort 
the execution of the instruction after part of the access has been performed. For example, 
this may occur when a program attempts to access a page it has not previously accessed or 
when the processor must check for a possible change in memory attributes when an access 
crosses a page boundary. When this occurs, the MPC60)1 or the operating system may 
restart the instruction. If the instruction is restarted, some bytes at that word address may 
be loaded from or stored to the target location a second time. 


The following rules apply to memory accesses with regard to restarting the instruction. 


¢ Aligned accesses—A single-register instruction that accesses an aligned operand is 
never restarted. 


¢ Misaligned accesses—A single-register instruction that accesses a misaligned 
operand may be restarted if the access crosses a page, BAT, or segment boundary. 


¢ Load/store multiple, move assist—These instructions may be restarted if, in 
accessing the locations specified by the instruction, a page, BAT, or segment 
boundary is crossed. 


2.4.1.2 Atomicity 


All aligned accesses are atomic. Instructions causing multiple accesses (for example, 
load/store multiple and move assist instructions) are not atomic. 


2.4.1.3 Access Order 


The ordering of memory accesses is not guaranteed unless the programmer inserts the 
appropriate ordering instructions, even if the accesses are generated by a single instruction. 
Misaligned accesses, load/store multiple instructions, and move assist instructions have no 
implicit ordering characteristics. For example, processor A may store a word operand on an 
odd half-word boundary. It may appear to processor A that the store completed atomically. 
Processor or other mechanism B, executing a load from the same location, may get a result 
that is a combination of the value of the first half word that existed prior to the store by 
processor A and the value of the second half word stored by processor A. 


2.4.2 Data Organization in Memory and Data Transfers 


Bytes in memory are numbered consecutively starting with 0. Each number is the address 
of the coiresponding byte. 


Memory operands may be bytes, half words, words, or double words, or, for the load/store 
multiple and move assist instructions, a sequence of bytes or words. The address of a 
memory operand is the address of its first byte (that is, of its lowest-numbered byte). 
Operand length is implicit for each instruction. 
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2.4.2.1 Alignment and Misaligned Accesses 


The operand of a single-register memory access instruction has a natural alignment 
boundary equal to the operand length. In other words, the “natural” address of an operand 
is an integral multiple of the operand length. A memory operand is said to be aligned if it 
is aligned at its natural boundary; otherwise it is misaligned. 


Operands for single-register memory access instructions have the characteristics shown in 
Table 2-27. (Although not permitted as memory operands, quad words are shown because 
quad-word alignment is desirable for certain memory operands.) 


Table 2-27. Memory Operands 


eae (eet Ree et 
Double word x000 


Note: An “x” in an address bit position indicates that the bit 
can be 0 or 1 independent of the state of other bits in 
the address. 














The concept of alignment is also applied more generally to data in memory. For example, 
12 bytes of data are said to be word-aligned if its address is a multiple of four. 


Some instructions require their memory operands to have certain alignments. In addition, 
alignment may affect performance. For single-register memory access instructions, the best 
performance is obtained when memory operands are aligned. Additional effects of data 
placement on performance are described in Chapter 7, “Instruction Timing.” 


Instructions are four bytes long and word-aligned. 


2.4.3 Byte and Bit Ordering 


The PowerPC architecture supports both big- and little-endian byte ordering. The default 
byte- and bit ordering is big-endian, as shown in Figure 2-31. Byte ordering can be set to 
little-endian by setting the LM bit in the HIDO register. Note that the mechanism for 
selecting between byte orderings is different in the MPC601 than it is in the PowerPC 
architecture. The PowerPC architecture provides two enable bits in the MSR that allow 
independent control for user- and supervisor-level software. 
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MSB 


Byte Byte N fran) 


Big-Endian Byte Ordering 
msb bit n (max) 


Pots | 2 | Ln 


Big-Endian Bit Ordering 
Figure 2-31. Big-Endian Byte and Bit Ordering 


If scalars (individual computational data items) were indivisible, the concept of byte 
ordering would be unnecessary. Order of bits or groups of bits within the smallest 
addressable unit of memory is irrelevant, because nothing can be observed about such 
order. Order matters only when scalars, which the processor and programmer regard as 
indivisible quantities, can be made up of more than one addressable units of memory. 


For a device in which the smallest addressable unit is the 64-bit double word, there is no 
question of the order of bytes within double words. All scalar transfers between registers 
and system memory are for double words and the address of the byte containing the high- 
order eight bits of a scalar is no different from the address of a byte containing any other 
part of the scalar. 


For PowerPC processors, as for most recent processor designs, the smallest addressable 
memory unit is the byte (8 bits), and most scalars are composed of groups of bytes. When 
a 32-bit scalar is moved from a register to memory, it occupies four consecutive byte 
addresses, and a decision must be made regarding the order of these bytes in these four 
addresses. 


The choice of byte ordering is arbitrary. Although there are 24 ways (4!) to specify the 
ordering of four bytes within a word, illustrated as all the permutations of ordering of four 
elements—ABCD, ABDC, ACBD, ACDB...DBCA, DCAB, DCBA—where A corresponds 
to the lowest address and D the highest, only two of these orderings are practical—ABCD 
(big-endian) and DCBA (little-endian). 


2.4.3.1 Big-Endian Byte Ordering 

Big-endian ordering assigns the lowest address to the highest-order eight bits of the scalar. 
This is called big-endian because the big end of the scalar, considered as a binary number, 
comes first in memory. 


2.4.3.2 Little-Endian Byte Ordering 

Little-endian byte ordering assigns the lowest address to the lowest-order (rightmost) 8 bits 
of the scalar. The little end of the scalar, considered as a binary number, comes first in 
memory. 
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2.4.4 Structure Mapping Examples 


The following C programming example contains an assortment of scalars and one character 
string. The value presumed to be in each structure element is shown in hexadecimal in the 
comments and are used below to show how the bytes that comprise each structure element 
are mapped into memory. 


struct { 
int a; /* 711121314 word */ 
double b; /* x°212223242225262728 doubleword */ 
char * C3 /* x’3132334 word */ 
char d[7}; /* ‘A’,“B’,‘C’,‘D’,‘E’,‘F’,‘G’ array of bytes =) 
short e; /* x?5152’ halfword */ 
int f; /* x’61626364’ word */ 


}s; 


Note that the C structure mapping introduces padding (skipped bytes) in the map in order 
to align the scalars on their proper boundaries—4 bytes between a and 5b, one byte between 
d and e, and two bytes between e and f. Both big- and little-endian mappings use the same 
amount of padding. 


2.4.4.1 Big-Endian Mapping 

The big-endian mapping of a structure S is shown in Figure 2-32. Addresses are shown in 
hexadecimal at the left of each double word and in small figures below each byte. The 
content of each byte, as shown in the preceding C programming example, is shown in 
hexadecimal as characters for the elements of the string. 


00 
23 24 

0s OA OB 

10 

18 

20 





Figure 2-32. Big-Endian Mapping of Structure S 
2.4.4.2 Little-Endian Mapping 


Figure 2-33 shows the structure, S, using little-endian mapping. Double words are laid out 
from right to left. 





2-46 PowerPC 601 RISC Microprocessor User's Manual MOTOROLA 


11 12 13 14 
07 06 05 04 03 02 01 00 
21 22 2 24 25 26 27 28 


OF 0E oD 0c 0B 0A 09 08 
‘D' ‘Cc ‘B’ ‘A 31 32 33 34 








17 16 15 14 13 12 11 10 
51 52 ‘G F = 

1F 1E iD 61G | 6B 1A 19 18 

61 62 63 64 

23 22 21 20 





Figure 2-33. Little-Endian Mapping of Structure S 


2.4.5 PowerPC Byte Ordering 


The default mapping for PowerPC processors is big-endian. Little-endian mode can be 
selected after a hard reset by setting the LM bit in the HIDO register in the MPC601 through 
the use of the mtspr instruction in the hard reset handler. The location of the bit is unique 
for each PowerPC processor. 


2.4.6 PowerPC Data Memory with LM Set 


One might expect that with the LM bit set (little-endian mode), that the system would have 
to perform two-, four-, or eight-way byte swaps when transferring a half word, word, or 
double word between memory and a register. However, the PowerPC architecture emulates 
little-endian byte ordering by manipulating the three low-order bits of the effective address. 
No bytes are swapped and individual multiple-byte scalars appear in memory in big-endian 
order. Setting LM adjusts the way effective addresses are computed without affecting the 
transfer of data between memory and registers, which is unencumbered by the need for 
multiplexers to swap bytes. 


2.4.6.1 Aligned Scalars 


For the load and store instructions listed in Table 2-28, the effective address is computed as 
specified in the instruction descriptions in Chapter 3, “Addressing Modes and Instruction 
Set Summary,” and is modified as shown in Table 2-29. 


Table 2-28. Load/Store Instructions for Data Aligned on Natural Boundaries 


fibz | Load Byte and Zero 
iow Load Byle and Zero with Update 
fezux | Load Byte and Zero with Update Indexed 











Load Byte and Zero Indexed 
Load Floating-Point Double-Precision 
Load Floating-Point Double-Precision with Update 
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Table 2-28. Load/Store Instructions for Data Aligned on Natural Boundaries 


2-48 




































fp nstcton 


Load Word Algebraic with Update Indexed 
Load Word Algebraic Indexed 


[eacWeageraendees | 
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Table 2-28. Load/Store !nstructions for Data Aligned on Natural Boundaries 


[ tnomonic | tnstucton 
[ate | Se Ring Pa ge Weed 
[ate [Ser Faia Sie Pedson 
[ar Se Rin Pt Sige Pen wpa 
[sur | Sore Rn Pt Sige Presn wi pd abe | 
[six [ Sean Pot geese 


Store Half Word with Update Indexed 
























pac 

pene Sa 

eae 

Cae 

|sthx Store Half Word Indexed 

a 
[atin [Se Wd ByeRoweemired 
Ec 
feo | SereWeawindpate 
[som SoreWeinUpaaeinimed 
feo [Smeviediied 


*Not implemented in the MPC601 


Table 2-29 shows how the EA is modified. 
Table 2-29. EA Modifications 


Data Width (Bytes) EA Modification 





ee 


The modified EA is passed to the data cache or the main memory and the specified width 
of the data is transferred between a GPR or FPR and the (as modified) addressed memory 
locations. Although the data is stored using big-endian byte ordering (but not in the same 
bytes within double words as with LM = 0), the modification of the EA makes it appear to 
the processor that it is stored in little-endian mode. 





The structure S would be placed in memory as shown in Figure 2-34. 
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Figure 2-34. PowerPC Little-Endian Structure Sin Memory or Cache 


Because of the modifications on the EA, the same structure S appears to the processor to be 
mapped into memory this way when LM = 1 (little-endian enabled). This is shown in 
Figure 2-35. 








Figure 2-35. PowerPC Little-Endian Structure S as Seen by Processor 


Note that as seen by the program executing in the processor, the mapping for the structure 
S is identical to the little-endian mapping shown in Figure 2-33. From outside of the 
processor, the addresses of the bytes making up the structure S are as shown in Figure 2-34. 
These addresses match neither the big-endian mapping of Figure 2-32 or the little-endian 
mapping of Figure 2-33. This must be taken into account when performing I/O operations 
in little-endian mode; this is discussed in Section 2.4.8, “PowerPC Input/Output in Little- 
Endian Mode.” 


2.4.6.2 Misaligned Scalars 


Performing an XOR operation on the low-order bits of the address of a scalar requires the 
scalar to be aligned on a boundary equal to a multiple of its length. When executing in little- 
endian mode (LM = 1), the MPC601 takes an alignment exception whenever any of the load 
and store instructions listed in Table 2-28 is issued with a misaligned EA, regardless of 
whether such an access could be handled without causing an exception in big-endian mode 
(LM = (0). 
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The PowerPC architecture defines that half words, words, and double words be placed in 
memory such that the little-endian address of the lowest-order bit is the EA computed by 
the load or store instruction; the little-endian address of the next-lowest-order byte is one 

greater, and so on. Figure 2-36 shows a four-byte word stored at little-endian address 5. The 

word is presumed to contain the binary representation of x’11121314’. 


13 
06 





Figure 2-36. PowerPC Little-Endian Mode, Word Stored at Address 5 


Figure 2-37 shows the same word stored by a little-endian program, as seen by the memory 
system (assuming big-endian mode). 


00 


08 





Figure 2-37. Word Stored at Little-Endian Address 5 as Seen by Big-Endian 
Addressing 


- Note that the misaligned word in this example spans two double words. The two parts of 
the misaligned word are not contiguous in the big-endian addressing space. 


An implementation may choose to support only a subset of misaligned little-endian 
memory accesses. For example, misaligned little-endian accesses contained within a single 
double word may be supported, while those that span double words may cause alignment 
exceptions. 


2.4.6.3 Non-Scalars 


The PowerPC architecture has two types of instructions that handle non-scalars (multiple 
instances of scalars). Neither type can deal with the modified EAs required in little-endian 
mode and both types cause alignment exceptions. 


2.4.6.3.1 String Operations 


The load and store string instructions, listed in Figure 2-31, cause alignment exceptions 
when they are executed in little-endian mode (HIDO[LM] = 1) 
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Table 2-30. Load/Store String Instructions that Take Alignment Exceptions if LM = 1 


Load String Word Immediate 
Load String Word Indexed 


Store String Word Immediate 
Store String Word Indexed 
Load String and Compare Byte Indexed 


String accesses are inherently misaligned; they transfer word-length quantities between 
memory (cache) and registers, but the quantities are not necessarily aligned on word 
boundaries. 





Note that the system software must determine whether to emulate the excepting instruction 
or treat it as an illegal operation. Because little-endian mode programs are new with respect 
to the PowerPC architecture—that is, they are not POWER binaries—having the compiler 
generate these instructions in little-endian mode would be slower than processing the string 
in-line or by using a subroutine call. 


2.4.6.3.2 Load and Store Multiple Instructions 


The following instructions cause alignment exceptions when executed in little-endian 
mode (HIDO[LM] = 1). 


Table 2-31. Load/Store Multiple Instructions that Take Alignment Exceptions if 
LM=1 


pimw Load Multiple Word 
|stmw | Store Multiple Word 


Although the words addressed by these instructions are on word boundaries, each word is 
in the half of its containing double word opposite from where it would be in big-endian 
mode. 










Note that the system software must determine whether to emulate the excepting instruction 
or treat it as an illegal operation. Because little-endian mode programs are new with respect 


‘ to the PowerPC architecture—that is, they are not POWER binaries—having the compiler 


generate these instructions in little-endian mode would be slower than processing the string 
in-line or by using a subroutine call. 
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2.4.7 PowerPC Instruction Memory Addressing in Little-Endian 
Mode 


Each PowerPC instruction occupies 32 bits (one word) of memory. PowerPC processors 
fetch and execute instructions as if the current instruction address had been advanced one 
word for each sequential instruction. When operating with LM = 1, the address is modified 
according to the little-endian rule for fetching word-length scalars; that is, it is XORed with 
b’100’. A program is thus an array of little-endian words with each word fetched and 
executed in order (not including branches). 


Consider the following example: 


loop: 
cmplwi 5,0 
beq done 
Iwzux 14, r5, 16 
add 17,17, 14 
subi r5, 1 
b loop 
done: 
stw r7, total 


Assuming the program starts at address 0), these instructions are mapped into memory for 
big-endian execution as shown in Figure 2-38. 


loop: cmplwi r5, 8 beq done 

00 Of 02 03 
Iwzux r4, r5, 16 add 17, r7, 14 

08 09 OA OB 

subi 5, 1 b loop 

10 #1 ,12 13 
done: stw r7, total 

18 #19 1A 1B/1C 1D 1E 1F 








Figure 2-38. PowerPC Big-Endian, Instruction Sequence as Seen by Processor 


If this same program is assembled for and executed in little-endian mode, the mapping seen 
by the processor appears as shown in Figure 2-39. 


Each machine instruction appears in memory as a 32-bit integer containing the value 
described in the instruction description, regardless of whether LM is set. This is because 
scalars are always mapped in memory in big-endian byte order. 
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Figure 2-39. PowerPC Little-Endian, Instruction Sequence as Seen by Processor 


When little-endian mapping is used, all references to the instruction stream must follow 
little-endian addressing conventions, including addresses saved in system registers when 
the exception is taken, return addresses saved in the link register, and branch displacements 
and addresses. 


¢ An instruction address placed in the link register by branch and link, or an 
instruction address saved in an SPR when an exception is taken is the address that a 
program executing in little-endian mode would use to access the instruction as a 
word of data using a load instruction. 


¢ An offset in a relative branch instruction reflects the difference between the 
addresses of the instructions, where the addresses used are those that a program 
executing in little-endian mode would use to access the instructions as data words 
using a load instruction. 


¢ A target address in an absolute branch instruction is the address that a program 
executing in little-endian mode would use to access the target instruction as a word 
of data using a load instruction. 


2.4.8 PowerPC Input/Output in Little-Endian Mode 


Input/output operations, such as writing the contents of a memory page to disk, transfers a 
byte stream on both big- and little-endian systems. For the disk transfer, byte 0) of the page 
is written to the first byte of a disk record and so on. 


For a PowerPC system running in big-endian mode, both the processor and the memory 
subsystem recognize the same byte as byte 0. However, this is not true for a PowerPC 
system running in little-endian mode because of the modification of the three low-order bits 


- . when the processor accesses memory. 


In order for I/O transfers in little-endian mode to appear to transfer bytes properly, they 
must be performed as if the bytes transferred were accessed one at a time, using the little- 
endian address modification appropriate for the single-byte transfers (XOR the bits with 
b’111’. This does not mean that I/O on little-endian PowerPC machines must be done using 
only one-byte-wide transfers. Data transfers can be as wide as desired, but the order of the 
bytes within double words must be as if they were fetched or stored one at a time. 
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Note that not all I/O operations performed in PowerPC systems is for large blocks as 
described above. I/O operations can be performed with certain devices by merely storing 
to or loading from addresses that are associated with the devices (this is referred to as I/O 
controller interface operations). Care must be taken with such operations when defining the 
addresses to be used because these addresses are subjected to the EA modifications 
described in Table 2-29. A load or store that maps to a control register on a device may 
require the bytes of the value transferred to be reversed. If this reversal is required, the loads 
and stores with byte reversal instructions may be used. 


2.4.9 Floating-Point Execution Models 


The IEEE-754 standard includes 32-bit and 64-bit arithmetic. The standard requires that 
single-precision arithmetic be provided for single-precision operands. The standard permits 
double-precision arithmetic instructions to have either (or both) single-precision or double- 
precision operands, but states that single-precision arithmetic instructions should not 
accept double-precision operands. 


The PowerPC architecture follows these guidelines: 
¢ Double-precision arithmetic instructions can have operands of either or both 
precisions 
¢ Single-precision arithmetic instructions require all operands to be single-precision 
¢ Double-precision arithmetic instructions produce double-precision values 
¢ Single-precision arithmetic instructions produce single-precision values 


For arithmetic instructions, conversions from double- ‘to single-precision must be done 
explicitly by software, while conversions from single- to double-precision are done 
implicitly. 


All implementations of the PowerPC architecture provide the equivalent of the following 
execution models to ensure that identical results are obtained. Definition of the arithmetic 
instructions for infinities, denormalized numbers, and NaNs follow conventions described 
in following sections. 


Although the double-precision format specifies an 11-bit exponent, exponent arithmetic 
uses two additional bit positions to avoid potential transient overflow conditions. An extra 
bit is required when denormalized double-precision numbers are prenormalized. A second 
bit is required to permit computation of the adjusted exponent value in the following cases 
when the corresponding exception enable bit is one: 


¢ Underflow during multiplication using a denormalized factor. 
¢ Overflow during division using a denormalized divisor. 


2.4.9.1 Execution Model for IEEE Operations 


The following description uses 64-bit arithmetic as an example; 32-bit arithmetic is similar 
except that the fraction field is a 23-bit field and the single-precision guard, round, and 
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sticky bits (described in this section) are logically adjacent to the 23-bit FRACTION (or 
mantissa) field. 


The bits and fields for the IEEE 64-bit execution model are defined as follows: 
¢ The S bit is the sign bit. 
¢ The C bit is the carry bit that captures the carry out of the significand. 


¢ The L bit is the leading unit bit of the pennee which receives the implicit bit from 
the operands. 


¢ The FRACTION is a 52-bit field, which accepts the fraction (mantissa) of the 
operands. 


¢ The guard (G), round (R), and sticky (X) bits are extensions to the low-order bits of 
the accumulator. The G and R bits are required for post normalization of the result. 
The G, R, and X bits are required during rounding to determine if the intermediate 
result is equally near the two nearest representable values. The X bit serves as an 
extension to the G and R bits by representing the logical OR of all bits that may 
appear to the low-order side of the R bit, either due to shifting the accumulator right 
or other generation of low-order result bits. The G and R bits participate in the left 
shifts with zeros being shifted into the R bit. Table 2-32 shows the significance of 
the G, R, and X bits with respect to the intermediate result (IR), the next lower in 
magnitude representable number (NL), and the next higher in magnitude 
representable number (NH). 


Table 2-32. Interpretation of G, R, and X Bits 


pela|x Interpretation 








The significand of the intermediate result is made up of the L bit, the FRACTION, and the 
G, R, and X bits. 


The infinitely precise intermediate result of an operation is the result normalized in bits L, 
FRACTION, G, R, and X of the floating-point accumulator. 


Before results are stored into an FPR, the significand is rounded if necessary, using the 
rounding mode specified by FPSCR[RN]. If rounding causes a carry into C, the significand 
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is shifted right one position and the exponent is incremented by one. This may yield an 
inexact result and possibly exponent overflow. Fraction bits to the left of the bit position 
used for rounding are stored into the FPR, and low-order bit positions, if any, are set to zero. 


Four rounding modes are provided which are user-selectable through FPSCR[RN] as 
described in Section 2.4.9.6, “Rounding.” For rounding, the conceptual guard, round, and 
sticky bits are defined in terms of accumulator bits. 


Table 2-33 shows the positions of the guard, round, and sticky bits for double-precision and 
single-precision floating-point numbers. 


Table 2-33. Location of the Guard, Round and Sticky Bits 


[sm [ome [tome [ne 


eC 


Rounding can be treated as though the significand were shifted right, if required, until the 
least significant bit to be retained is in the low-order bit position of the FRACTION. If any 
of the guard, round, or sticky bits are non-zero, the result is inexact. 





Z1 and Z2, defined in Section 2.4.9.6, “Rounding,” can be used to approximate the result 
in the target format when one of the following rules is used: 
¢ Round to nearest 
— Guard bit = 0: The result is truncated. (Result exact (GRX = 000) or closest to 
next lower value in magnitude (GRX = 001, 010, or 011) 
— Guard bit = 1: Depends on round and sticky bits: 


Case a: If the round or sticky bit is one (inclusive), the result is incremented. 
(result closest to next higher value in magnitude (GRX = 101, 110, or 111)) 


Case b: If the round and sticky bits are zero (result midway between closest 
representable values) then if the low-order bit of the result is one, the result is 
incremented. Otherwise (the low-order bit of the result is zero) the result is 
truncated (this is the case of a tie rounded to even). 


¢ If during the round to nearest process, truncation of the unrounded number produces 
the maximum magnitude for the specified precision, the following action is taken: 


— Guard bit = 1: Store infinity with the sign of the unrounded result. 
— Guard bit = 0: Store the truncated (maximum magnitude) value. 
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¢ Round toward zero—Choose the smaller in magnitude of Z1 or Z2. If the guard, 
round, or sticky bit is non-zero, the result is inexact. 


¢ Round toward +infinity 
Choose Z1. 


¢ Round toward -infinity 
Choose Z2. 


Where the result is to have fewer than 53 bits of precision because the instruction is 
a floating round to single-precision or single-precision arithmetic instruction, the 
intermediate result either is normalized or is placed in correct denormalized form 
before the result is potentially rounded. 


2.4.9.1.1 Execution Model for Multiply-Add Type Instructions 

The PowerPC architecture makes use of a special form of instruction that performs up to 
three operations in one instruction (a multiply, an add, and a negate). With this added 
capability is the special feature of being able to produce a more exact intermediate result as 
an input to the rounder. The 32-bit arithmetic is similar except that the fraction field is 
smaller. Note that the rounding occurs only after add; therefore, the computation of the sum 
and product together are infinitely precise before the final result is rounded to a 
representable format. 


The first part of the operation is a multiply. The multiply has two 53-bit significands as 
inputs, which are assumed to be prenormalized, and produces a result conforming to the 
above model. If there is a carry out of the significand (into the C bit), the significand is 
shifted right one position, placing the L bit into the most significant bit of the FRACTION 
and placing the C bit into the L bit. All 106 bits (L bit plus the fraction) of the product take 
part in the add operation. If the exponents of the two inputs to the adder are not equal, the 
significand of the operand with the smaller exponent is aligned (shifted) to the right by an 
amount added to that exponent to make it equal to the other input's exponent. Zeros are 
shifted into the left of the significand as it is aligned and bits shifted out of bit 105 of the 
significand are ORed into the X' bit. The add operation also produces a result conforming 
to the above model with the X' bit taking part in the add operation. 


The result of the add is then normalized, with all bits of the add result, except the X' bit, 
participating in the shift. The normalized result provides an intermediate result as input to 
the rounder that conforms to the model described in Section 2.4.9.1, “Execution Model for 
IEEE Operations,” where: 

¢ The guard bit is bit 53 of the intermediate result. 

¢ The round bit is bit 54 of the intermediate result. 

¢ The sticky bit is the OR of all remaining bits to the right of bit 55, inclusive. 


If the instruction is floating negative multiply-add or floating negative multiply-subtract, 
the final result is negated. 
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Status bits are set to reflect the result of the entire operation: for example, no status is 
recorded for the result of the multiplication part of the operation. 


2.4.9.2 Floating-Point Data Format 

The PowerPC architecture defines the representation of a floating-point value in two 
different binary, fixed-length formats. The format may be a 32-bit format for a single- 
precision floating-point value or a 64-bit format for a double-precision floating-point value. 
The single-precision format may be used for data in memory. The double-precision format 
can be used for data in memory or in floating-point registers. 


The length of the exponent and the fraction fields differ between these two precision 
formats. The structure of the single-precision format is shown in Figure 2-40; the structure 
of the double-precision format is shown in Figure 2-41. 


EXP FRACTION 
0 


1 8 9 31 


Figure 2-40. Floating-Point Single-Precision Format 


EXP FRACTION 
0 


1 1112 63 
Figure 2-41. Floating-Point Double-Precision Format 


Values in floating-point format consist of three fields: 
° S (sign bit). 
¢ EXP (exponent+bias) 
¢ FRACTION (fraction) 


If only a portion of a floating-point data item in memory is accessed, as with a load or store 
instruction for a byte of halfword (or word in the case of floating-point double-precision 
format), the value affected depends on whether the PowerPC system is using big- or little- 
endian byte ordering, which is described in Section 2.4.3, “Byte and Bit Ordering.” Big- 
endian mode is the default. 


The significand consists of a leading implied bit concatenated on the right with the 
FRACTION. This leading implied bit is a 1 for normalized numbers and a 0 for 
denormalized numbers in the unit bit position (that is, the first bit to the left of the binary 
point). Values representable within the two Acating-point formats can be specified by the 
parameters listed in Table 2-34. 
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Table 2-34, IEEE Floating-Point Fields 


Parameter Single-Precision Double-Precision 


(unbiased) 


The exponent is expressed as an 8-bit value for single-precision numbers or an 11-bit value 
for double-precision numbers. These bits hold the biased exponent; the true value of the 
exponent can be determined by subtracting 127 for single-precision numbers and 1023 for 
double-precision values. This is shown in Figure 2-42. Note that using a bias eliminates the 
need for a sign bit. The highest-order bit is used both to generate the number, and is an 
implicit sign bit. Note also that two values are reserved—all bits set indicates that the 
number is an infinity or NaN and all bits cleared indicates that the number is either zero or 
denormalized. 















2.4.9.2.1 Value Representation 


The PowerPC architecture defines numerical and non-numerical values representable 
within single- and double-precision formats. The numerical values are approximations to 
the real numbers and include the normalized numbers, denormalized numbers, and zero 
values. The non-numerical values representable are the positive and negative infinities, and 
the NaNs. The positive and negative infinities are adjoined to the real numbers but are not 
numbers themselves, and the standard rules of arithmetic do not hold when they appear in 
an operation. They are related to the real numbers by “order” alone. It is possible however 
to define restricted operations among numbers and infinities as defined below. The relative 
location on the real number line for each of the defined entities is shown in Figure 2-43. 
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Biased Exponent Single-Precision Double-Precision 
(binary) (unbiased) (unbiased) 


Reserved for Infinities and NaNs 


_ 
a 
—_ 
_ 


WMesee. 10 +127 +1023 







We... 01 +126 +1022 


Positive 


ee eee 





Zero yw! ~sS—OCéOdz«..... 


= 


Negative 


-126 -1022 


’ 
_ 


Reserved for Zeros and Denormalized Numbers 


Figure 2-42. Biased Exponent Format 


Unrepresentable, small numbers yO 


-INF -NORM | -DENORM | -0 +0 







+DENORM +INF 








Tiny Tiny 


Figure 2-43. Approximation to Real Numbers 
The positive and negative NaNs are not related to the numbers or +e° by order or value, but 


they are encodings that convey diagnostic information such as the representation of 
uninitialized variables. Table 2-35 describes each of the floating-point formats. 


Table 2-35. Recognized Floating-Point Numbers 


a 
EE 
rs) 


(anaes 
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Table 2-35. Recognized Floating-Point Numbers (Continued) 


ee el 
Sens aera reel es 


The following sections describe floating-point values defined in the architecture: 














2.4.9.2.2 Binary Floating-Point Numbers 

Binary floating-point numbers are machine-representable values used to approximate real 
numbers. Three categories of numbers are supported: normalized numbers, denormalized 
numbers, and zero values. 


2.4.9.2.3 Normalized Numbers (tNORM) 
The values for normalized numbers have a biased exponent value in the range: 


¢ 1-254 in single-precision format 
¢ 1-2046 in double-precision format 


The implied unit bit is one. Normalized numbers are interpreted as follows: 


NORM = (-1)® x 2 x (1.fraction) 


where (s) is the sign, (E) is the unbiased exponent and (1.fraction) is the significand 
composed of a leading unit bit (implied bit) and a fractional part. The format for normalized 
numbers is shown in Figure 2-44. 


ez MINSEYBIASED) MANTISSA=ANY BIT PATTERN 


SIGN OF MANTISSA, 0 OR 1 


Figure 2-44. Format for Normalized Numbers 


The ranges covered by the magnitude (M) of a normalized floating-point number are 
approximately equal to the following: 


Single-precision format: 
1.2x10°8 < M <3.4x10%8 
Double-precision format: 


2.2x10°3°8 < M < 1.8x103%8 
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2.4.9.2.4 Zero Values (+0) 


Zero values have a biased exponent value of zero and a fraction value of zero. This is shown 
in Figure 2-45. Zeros can have a positive or negative sign. The sign of zero is ignored by 
comparison operations (that is, comparison regards +() as equal to -0). 


i EN BIASED) MANTISSA=0 





SIGN OF MANTISSA, 0 OR 1 
Figure 2-45. Format for Zero Numbers 


2.4.9.2.5 Denormalized Numbers (+{DENORM) 


Denormalized numbers have a biased exponent value of zero and a non-zero fraction value. 
The format for denormalized numbers is shown in Figure 2-46. 


EXPONENT=0 MANTISSA=ANY NON-ZERO 
(BIASED) BIT PATTERN 





SIGN OF MANTISSA, 0 OR 1 


Figure 2-46. Format for Denormalized Numbers 


Denormalized numbers are non-zero numbers smaller in magnitude than the representable 
normalized numbers. They are values in which the implied unit bit is zero. Denormalized 
numbers are interpreted as follows: 


DENORM = (-1)® x 2©™!" x ((.fraction) 


Emin is the minimum representable exponent value (-126 for single-precision, -1022 for 
double-precision). 


2.4.9.2.6 Infinities (+c) 
Positive and negative infinities have the maximum biased exponent value: 


* 255 in the single-precision format 
¢ 2047 in the double-precision format 


The format for infinities is shown in Figure 2-47. 


EXPONENT=MAXIMUM 
al (BIASED) MANTISSA=0 


SIGN OF MANTISSA, 0 OR 1 





Figure 2-47. Format for Positive and Negative Infinities 


The fraction value is zero. Infinities are used to approximate values greater in magnitude 
than the maximum normalized value. Infinity arithmetic is defined as the limiting case of 
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real arithmetic, with restricted operations defined between numbers and infinities. Infinities 
and the reals can be related by ordering in the affine sense: 


-co < every finite number < +e0 


Arithmetic using infinite numbers is always exact and does not signal any exception, except 
when an exception occurs due to the invalid operations as described in Section 5.4.7.2, 
“Invalid Operation Exception Conditions.” 


2.4.9.2.7 Not a Numbers (NaNs) 

NaNs have the maximum biased exponent value and a non-zero fraction value. The format 
for NaNs is shown in Figure 2-48. The sign bit of NaNs is ignored (that is, NaNs are neither 
positive nor negative). If the high-order bit of the fraction field is a zero, the NaN is a 
signaling NaN; otherwise it is a quiet NaN (QNaN). 


EXPONENT=MAXIMUM MANTISSA=ANY NON-ZERO 
(BIASED) BIT PATTERN 
SIGN OF MANTISSA (0 for +NaN); 1 for -NaN) 
Figure 2-48. Format for NANs 


Signaling NaNs signal exceptions when they are specified as arithmetic operands. 


Quiet NaNs represent the results of certain invalid operations, such as invalid arithmetic 
operations on infinities or on NaNs, when the invalid operation exception is disabled 
(FPSCR[VE]=0). QNaNs are generated under the following conditions: 


¢ An invalid operation occurs and FPSCR[VE] = 0 


¢ An mffs instruction is executed and the upper 32 bits are undefined (only in the 
MPC601). 


¢ On Floating Convert to Integer with Round (fctir) and Floating Convert to Integer 
with Round toward Zero (fctirz) the PowerPC architecture defines bits 0-31 of the 
target floating point register as undefined. In the MPC601, these bits take on the 
value x'FFF8 0000' (which is the representation for a QNaN). 


Quiet NaNs propagate through all operations, except ordered comparison and conversion 
to integer operations without signalling exceptions. Specific encodings in QNaNs can thus 
be preserved through a sequence of operations and used to convey diagnostic information 
to help identify results from invalid operations. 


When a QNaN results from an operation because an operand is a NaN or because a QNaN 
is generated due to a disabled invalid operation exception, the following rule is applied to 
determine the QNaN with the high-order fraction bit set to one that is to be stored as the 
result: . 


If (frA) is a NaN 
Then frD <— (frA) 
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Else if (frB) is a NaN 
Then frD <— (frB) 
Else if (frC) is a NaN 
Then frD <— (frC) 
Else if generated QNaN 
Then frD < generated QNaN 


If the operand specified by frA is a NaN, that NaN is stored as the result. Otherwise, if the 
operand specified by frB is a NaN (if the instruction specifies an frB operand), that NaN is 
stored as the result. Otherwise, if the operand specified by frC is a NaN (if the instruction 
specifies an frC operand), that NaN is stored as the result. Otherwise, if a QNaN is 
generated by a disabled invalid operation exception, that QNaN is stored as the result. If a 
QNaN is to be generated as a result, the QNaN generated has a sign bit of zero, an exponent 
field of all ones, and a high-order fraction bit of one with all other fraction bits zero. An 
instruction that generates a QNaN as the result of a disabled invalid operation generates this 
QNaN. This is shown in Figure 2-49. 


ie 111...1 1000....0 


SIGN OF MANTISSA, NaN) OR 1 
Figure 2-49. Representation of QNaN 
2.4.9.3 Sign of Result 


The following rules govern the sign of the result of an arithmetic operation, when the 
operation does not yield an exception. These rules apply even when the operands or results 
are +0) or too: 


¢ The sign of the result of an addition operation is the sign of the source operand 
having the larger absolute value. The sign of the result of the subtraction operation, 
x-y, is the same as the sign of the result of the addition operation, x+(-y). 


* When the sum of two operands with opposite sign, or the difference of two operands 
with the same sign, is exactly zero, the sign of the result is positive in all rounding 
modes except round toward negative Infinity(-c°), in which case the sign is negative. 


¢ The sign of the result of a multiplication or division operation is the exclusive OR 
of the signs of the source operands. 


¢ The sign of the result of a round to single-precision or convert to/from integer 
operation is the sign of the source operand. 


For multiply-add instructions, these rules are applied first to the multiplication operation 
and then to the addition or subtraction operation (one of the source operands to the addition 
or subtraction operation is the result of the multiplication operation). 
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2.4.9.4 Normalization and Denormalization 

When an arithmetic operation produces an intermediate result, consisting of a sign bit, an 
exponent, and a non-zero significand with a zero leading bit, the result is not a normalized 
number and must be normalized before it is stored. 


A number is normalized by shifting its significand left while decrementing its exponent by 
one for each bit shifted, until the leading significand bit becomes one. The guard bit and the 
round bit participate in the shift with zeros shifted into the round bit; see Section 2.4.9.1, 
“Execution Model for IEEE Operations.” During normalization, the exponent is regarded 
as if its range were unlimited. If the resulting exponent value is less than the minimum 
value that can be represented in the format specified for the result, the intermediate result 
is said to be “tiny” and the stored result is determined by the rules described in Section 
5.4.7.5, “Underflow Exception Condition.” The sign of the number does not change. 


When an arithmetic operation produces a non-zero intermediate result whose exponent is 
less than the minimum value that can be represented in the format specified, the stored 
result may need to be denormalized. The result is determined by the rules described in 
Section 5.4.7.5, “Underflow Exception Condition.” 


A number is denormalized by shifting its significand to the right while incrementing its 
exponent by one for each bit shifted until the exponent equals the format's minimum value. 
If any significant bits are lost in this shifting process then “Loss of Accuracy” has occurred 
and an underflow exception is signaled. The sign of the number does not change. 


When denormalized numbers are operands of multiply and divide operations, operands are 
prenormalized internally before performing the operations. 


2.4.9.5 Data Handling and Precision 


There are specific instructions for moving floating-point data between the FPRs and 
memory. For double-precision format data, the data is not altered during the move. For 
single-precision data, the format is converted to double-precision format when data is 
loaded from memory into an FPR. A format conversion from double- to single-precision is 
performed when data from an FPR is stored. Floating-point exceptions cannot occur during 
these operations. 


All arithmetic operations use floating-point double-precision format. 


Floating-point single-precision formats are used by the following four types of instructions: 


¢ Load Floating-Point Single-Precision (Ifs)—This instruction accesses a single- 
precision operand in single-precision format in memory, converts it to double- 
precision, and loads it into an FPR. Exceptions are not detected during the load 
operation. 


¢ Round to floating-point single-precision—If the operand is not already in single- 
precision range, the floating round to single-precision instruction rounds a double- 
precision operand to single-precision, checking the exponent for single-precision 
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range and handling any exceptions according to respective enable bits in the FPSCR. 
The instruction places that operand into an FPR as a double-precision operand. For 
results produced by single-precision arithmetic instructions and by single-precision 
loads, this operation does not alter the value. 


¢ Single-precision arithmetic instructions—These instructions take operands from the 
FPRs in double-precision format, performs the operation as if it produced an 
intermediate result correct to infinite precision and with unbounded range, and then 
forces this intermediate result to fit in single-precision format. Status bits in the 
FPSCR and in the condition register are set to reflect the single-precision result. The 
result is then converted to double-precision format and placed into an FPR. The 
result falls within the range supported by the single format. 


* For single-precision operations, source operands must be representable in single- 
precision format. If they are not, the result placed into the target FPR, and the setting 
of status bits in the FPSCR and in the condition register, are undefined. 


¢ Store Floating-Point Single-Precision (stfs)—This form of instruction converts a 
double-precision operand to single-precision format and stores that operand into 
memory. If the operand requires denormalization in order to fit in single-precision 
format, it is automatically denormalized prior to being stored. No exceptions are 
detected on the store operation (the value being stored is effectively assumed to be 
the result of an instruction of one of the preceding three types). 


When the result of a Load Floating-Point Single-Precision (Ifs), Floating-Point Round to 
Single-Precision (frspx), or single-precision arithmetic instruction is stored in an FPR, the 
low-order 29 fraction bits are zero. This is shown in Figure 2-50. 


Bit 35 
EXP XXXXXXXXXXXXXXXXXXXXxXX x 00000000000000000000000000000 


0 1 1112 63 
Figure 2-50. Single-Precision Representation in an FPR 


The Floating-Point Round To Single-Precision (frpsx) instruction allows conversion from 
double- to single-precision with appropriate exception checking and rounding. This 
instruction should be used to convert double-precision floating-point values (produced by 
double-precision load and arithmetic instructions) to single-precision values before storing 
them into single-format memory elements or using them as operands for single-precision 
arithmetic instructions. Values produced by single-precision load and arithmetic 
instructions can be stored directly, or used directly as operands for single-precision 
arithmetic instructions, without preceding the store, or the arithmetic instruction, by frspx. 


A single-precision value can be used in double-precision arithmetic operations. The reverse 
is true only if the double-precision value can be represented in single-precision format. 
Some implementations may execute single-precision arithmetic instructions faster than 
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double-precision arithmetic instructions. Therefore, if double-precision accuracy is not 
required, using single-precision data and instructions can speed operations. 


2.4.9.6 Rounding 


All arithmetic instructions defined by the PowerPC architecture produce an intermediate 
result considered infinitely precise. This result must then be written with a precision of 
finite length into an FPR. After normalization or denormalization, if the infinitely precise 
intermediate result cannot be represented in the precision required by the instruction, it is 
rounded before being placed into the target FPR. 


The instructions that potentially round their result are the arithmetic, multiply-add, and 
rounding and conversion instructions. As shown in Figure 2-51, whether rounding occurs 
depends on the source values. 


Rounding 





Figure 2-51. Rounding Flow Diagram 


Each of these instructions sets FPSCR bits FR and FI, according to whether rounding 
occurs (FI) and whether the fraction was incremented (FR). If rounding occurs, FI is set to 
one and FR may be either zero or one. If rounding does not occur, both FR and FI are 
cleared. Other floating-point instructions do not alter FR and FI. Four modes of rounding 
are provided that are user-selectable through the floating-point rounding control field in the 
FPSCR. See Section 2.2.3, “Floating-Point Status and Control Register (FPSCR).” These 
are encoded as follows in Table 2-36. 


Let Z be the infinitely precise intermediate arithmetic result or the operand of a conversion 
operation. If Z can be represented exactly in the target format, no rounding occurs and the 
result in all rounding modes is equivalent to truncation of Z. If Z cannot be represented 
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Table 2-36. FPSCR Bit Settings—RN Field 


| oo Round to nearest 
| or Round toward zero 











Round toward +infinity 
Round toward -infinity 


exactly in the target format, let Z1 and Z2 be the next larger and next smaller numbers 
representable in the target format that bound Z; then Z1 or Z2 can be used to approximate 
the result in the target format. 








Figure 2-52 shows a graphical representation of Z, Z1, and Z2 in this case and Figure 2-53 
shows the selection of Z1 and Z2 for the four rounding settings. 


By incrementing LSB of Z 
Infinitely precise value 
By truncating after LSB 














0 Z2 


Negative values ~—_|—_~ Positive values 


Figure 2-52. Relation of Z1 and Z2 


Z1 





Z 


Rounding follows the four following rules: 


* Round to nearest—Choose the best approximation (Z1 or Z2. In case of a tie, choose 
the one which is even (least significant bit 0)). 


' © Round toward zero—Choose the smaller in magnitude (Z1 or Z2). 
¢ Round toward +infinity—Choose Z1. 
¢ Round toward -infinity—Choose Z2. 


See Section 2.4.9.1, “Execution Model for IEEE Operations,” for a detailed explanation of 
rounding. If Z is to be rounded up and Z1 does not exist (that is, if there is no number larger 
than Z that is representable in the target format), then an overflow exception occurs if Z is 
positive and an underflow exception occurs if Z is negative. Similarly, if Z is to be rounded 
down and Z2 does not exist, then an overflow exception occurs if Z is negative and an 
underflow exception occurs if Z is positive. The results in these cases are defined in Section 
5.4.7.1, “Floating-Point Enabled Program Exceptions.” 
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Z is infinitely 
precise result 
or operand 


Does Z fit Yes 


target format? 
No 
Z1<Z<Z2 
No 


Round 
toward +007? 


No 


Round 
toward +007 


No 


Round 
toward 0? 


No 


Rounding = Truncation 


Yes 
Choose Z2 


Yes 
Choose Z1 


Yes 
Choose Z1 


Round Choose best approxi- 
to nearest mation (Z1 or Z2) 


if tie 


Choose even value (Z1 
or Z2 whose Isb is 0) 


Figure 2-53. Selection of Z1 and Z2 


2.5 Unimplemented PowerPC Registers 
The following PowerPC registers are not implemented in the MPC6()1: 


¢ The time base SPRs are used in the PowerPC architecture instead of the RTC 
registers. The architected time base facility operates as a subdivision of the 
frequency provided by the processor clock. 


¢ Floating-point exception cause register (FPECR)—This is a supervisor-level SPR 
(1023) that is used by some implementations to determine the cause of a floating- 
point error. 
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¢ Address space register (ASR)—The ASR is a 64-bit SPR used in 64-bit 
implementations to perform address translations. 


¢ Each PowerPC processor implements a unique set of HID registers. Note that some 
of these registers may be implemented the same way in more than one PowerPC 
processor design. 


An mtspr or mfspr instruction that specifies an unimplemented register is treated as a no- 
op. If a privilege violation is indicated, the program exception has priority over the no-op. 
This can occur if a user-mode program tries to access a register with bit () of the SPR 
encoding field (in the instruction format) set. However, in this case the program exception 
is taken regardless of whether the SPR encoding specified an implemented register. 


2.6 Reset 


The following sections describe hard reset and soft reset in the MPC601 processor. For 
more information about the reset exception see Section 5.4.1, “Reset Exceptions 
(x’00100).” 


2.6.1 Hard Reset 


The hard reset sequence begins when the hard reset signal HRESET is negated after being 
driven as described in Section 8.2.9.4.1, “Hard reset (HRESET)—Input.” Note that a hard 
reset operation is required on power-on in order to properly reset the MPC601. 


Table 2-37 shows the state of the registers after a hard reset and before it fetches the first 
instruction from address x’FFFO 0100’ the system reset exception vector. 


Table 2-37. Settings after Hard Reset (Used at Power-On) 
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Table 2-37. Settings after Hard Reset (Used at Power-On) (Continued) 


ee ee ee 


eects 00000000 Tag directory All Os. (However, the LRU 
bits are initialized such 
that each side of the 
cache has a unique LRU 
value.) 


PSRRO 00000000 | 


Notes: ' In the earliest release of the MPC601 (DD1), this is 00010000. Later versions of the hardware may be 
different. 
2 Master checkstop enable on, sequencer GPR self-test checkstop invalid microcode instruction checkstop on. 
3 Note that if external clock is connected to RTC for the MPC601, then the RTCL, RTCU, and DEC can change 
from their initial value of Os without receiving instructions to load those registers. 







The following is also true after a hard reset operation: 
¢ External checkstops are enabled. 


¢ The on-chip COP has given control of the PIs/POs to the rest of the chip for 
functional use. 


¢ Since the reset exception has data and instruction translation disabled (MSR[DT] 
and MSR[IT] both cleared), the chip operates in direct address translation mode. 
This implies that instruction fetches as well as loads and stores are cacheable. 
(Operations that correspond to direct address translations are implicitly cacheable, 
not write-through mode, and require coherency checking on the bus). 


¢ All internal arrays and registers are cleared during the hard reset process. 


2.6.2 Soft Reset 


Registers are not re-initialized when a soft reset occurs (SRESET is asserted as described 
in Section 8.2.9.4.2, “Soft Reset (GRESET)—Input’). The SRRO and SRRI registers are 
updated with instruction and MSR data, and the MSR values are reset according to 
procedures described in Section 5.4.1, “Reset Exceptions (x’00100’).” 





2-72 PowerPC 601 RISC Microprocessor User's Manual MOTOROLA 





Chapter 3 
Addressing Modes and Instruction Set 
Summary 


This chapter describes instructions and address modes supported by the MPC60)1 
microprocessor. These instructions are divided into the following categories: 


¢ Integer instructions—These include computational and logical instructions. 


¢ Floating-point instructions—These include floating-point computational 
instructions, as well as instructions that affect the floating-point status and control 
register. 


¢ Load/store instructions—These include integer and floating-point load and store 
instructions. 


¢ Flow control instructions—These include branching instructions, condition register 
logical instructions, trap instructions, and other instructions that affect the 
instruction flow. 


¢ Processor control instructions—These instructions are used for synchronizing 
memory accesses and management of caches, TLBs, and the segment registers. 


Note that this grouping of the instructions does not indicate which execution unit executes 
a particular instruction or group of instructions. This information, which is useful in taking 
full advantage of the MPC601’s superscalar parallel instruction execution, is provided with 
each instruction in Chapter 10, “Instruction Set.” 


Integer instructions operate on byte, half-word, and word operands. Floating-point 
instructions operate on single-precision and double-precision floating-point operands. The 
PowerPC architecture uses instructions that are four bytes long and word-aligned. It 
provides for byte, half-word, and word operand fetches and stores between memory and a 
set of 32 general-purpose registers (GPRs). It also provides for word and double-word 
operand fetches and stores between memory and a set of 32 floating-point registers (FPRs). 


Computational instructions do not modify memory. To use a memory operand in a 
computation and then modify the same or another memory location, the memory contents 
must be loaded into a register, modified, and then written back to the target location. 


The MPC60)1 executes program instructions when it is in the normal execution state. 
However, the flow of instructions can be interrupted directly by the execution of an 
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instruction or by an asynchronous event. Either kind of exception may cause one of several 
components of the system software to be invoked. 


3.1 Memory Addressing 


A program references memory using the effective address computed by the processor when 
it executes a memory access or branch instruction, or when it fetches the next sequential 
instruction. 


3.1.1 Effective Address Calculation 


The effective address is the 32-bit address computed by the processor when executing a 
memory access or branch instruction or when fetching the next sequential instruction. For 
a memory access instruction, if the sum of the effective address and the operand length 
exceeds the maximum effective address, the storage operand is considered to wrap around 
from the maximum effective address to effective address (), as described in the following 
paragraphs. 


Effective address computations for both data and instruction accesses use 32-bit unsigned 
binary arithmetic. A carry from bit 0 is ignored. 
Load and store operations have three categories of effective address generation: 


¢ Register indirect with immediate index mode. The d operand is added to the contents 
of the GPR specified by the rA operand to generate the effective address. 


¢ Register indirect with index mode. The contents of the GPR specified by rB operand 
are added to the contents of the GPR specified by the rA operand to generate the 
effective address. 


¢ Register indirect mode. The contents of the GPR specified by the rA operand are 
used as the effective address. 
Branch instructions have three categories of effective address generation: 


¢ Immediate addressing. The BD or LI operands are sign extended with the two low- 
order bits cleared to zero to generate the branch effective address. 


¢ Link register indirect. The contents of the link register with the two low-order bits 
cleared to zero are used as the branch effective address. 


¢ Counter register indirect. The contents of the counter register with the two low-order 
bits cleared to zero are used as the branch effective address. 


Branch instructions can optionally load the link register with the next sequential instruction 
address (current instruction address + 4). 


3.1.2 Context Synchronization 


The System Call (sc), Return from Interrupt (rfi), and Move to Machine State Register 
(mtmsr) instructions perform context synchronization by allowing previously issued 
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instructions to complete before performing a context switch. Execution of one of these 
instructions ensures the following: 


3.2 


No higher priority exception exists. 


All previous instructions have completed to a point where they can no longer cause 
an exception. If a prior memory access instruction causes direct-store error 
exceptions, the results must be determined before this instruction is executed. 


Previous instructions complete execution in the context (privilege, protection, and 
address translation) under which they were issued. 


The instructions following the sc, rfi, and mtmsr instruction execute in the context 
established by these instruction. 


Exception Summary 


There are two kinds of exceptions in the MPC601—those caused directly by the execution 
of an instruction and those caused by an asynchronous event. Either kind of exception 
causes one of several components of the system software to be invoked. 


Exceptions can be caused directly by the execution of an instruction in the following 
situations: 


An attempt to execute an illegal instruction or an attempt by an application program 
to execute a supervisor-level instruction causes the illegal instruction or supervisor- 
level instruction handler to be invoked. 


An attempt to access memory in a manner that violates memory protection causes 
the data access exception handler or instruction access exception handler to be 
invoked. 


An attempt to access memory with an effective address alignment that is invalid for 
the instruction causes the alignment exception handler to be invoked. 


The execution of an sc instruction causes the system service program to be invoked. 


The execution of a trap instruction that traps causes the program exception trap 
handler to be invoked. 

The execution of a floating-point instruction when floating-point instructions are 
unavailable causes the floating-point unavailable handler to be invoked. 


The execution of an instruction that causes a floating-point exception that is enabled 
causes the floating-point enabled exception handler to be invoked. 


Exceptions caused by asynchronous events are described in Chapter 5, “Exceptions”. 
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3.3 Integer Instructions 
This section describes the integer instructions. These consist of the following: 


* Integer arithmetic instructions 

* Integer compare instructions 

¢ Integer rotate and shift instructions 
¢ Integer logical instructions. 


Integer instructions use the content of the GPRs as source operands and place results into 
GPRs, into the integer exception register (XER), and into condition register fields. Trap 
instructions compare the contents of one GPR with a second GPR or with immediate data 
and, if the conditions are met, invoke the program exception trap handler. 


These instructions treat the source operands as signed integers unless the instruction is 
explicitly identified as an unsigned operation or an address conversion. 


The integer instructions that are coded to update the condition register and the integer 
logical and arithmetic instructions (addic., andi., and andis.) set condition register field 
CRO (bits O—-3) to characterize the result of the operation. The condition register field CRO 
is set as if result were compared algebraically to zero. 


The integer arithmetic instructions (addic, addic., subfic, addc, subfc, adde, subfe, 
addme, subfme, addze, and subfze) always set integer exception register bit CA to reflect 
the carry out of bit 0. Integer arithmetic instructions with the overflow enable (OE) bit set 
will cause the XER bits SO and OV to be set to reflect overflow of the 32-bit result. 


Unless otherwise noted, when condition register field CRO and the XER are affected they 
reflect the value placed in the target register. 


The MPC601 performs best for aligned load and store operations. See Section 5.4.6, 
“Alignment Exception (x’00600’),” for scenarios that cause an alignment exception. 


3.3.1 Integer Arithmetic Instructions 


In the MPC601 instructions that select the overflow option (enable XER(OV)) or that set 
the integer exception register carry bit (CA may delay the execution of subsequent 
instructions. 


The MPC601 integer unit defines one additional register to the user register set and 
programming model that is not present in other PowerPC implementations. The MQ 
register is a 32-bit register whose primary use is to provide a register extension to 
accommodate the product for the MPC601-specific Multiply (mul) instruction and the 
dividend for the MPC601-specific Divide (div) instruction. It is also used as an operand of 
long rotate and shift instructions. 


The MQ register is never architecturally modified by any of the instructions defined in the 
PowerPC architecture. However, in the MPC601the MQ register may be modified during 
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the execution of any POWER or PowerPC multiply or divide instruction. The value written 
to the MQ register during these operations is operand dependent. 


Table 3-1 lists the integer arithmetic instructions for the MPC601. Note that some of the 
instructions are specific to the MPC601 implementation. 


Table 3-1. Integer Arithmetic Instructions 
Operand | . 

jane | amamone | rt | pete 

Add rD,rA,SIMM | The sum (rA[0) + SIMM is placed into register rD. 

Immediate 

Add addis rD,rA,SIMM_ | The sum (rA|0) + (SIMM |] x ‘0000’ is placed into register rD. 
Immediate 
D 


Shifted 
rD,rA,rB The sum (rA) + (rB) is placed into register rD. 



































add Add 

add. Add with CR Update. The dot suffix enables the update of 
the condition register. 

addo Add with Overflow Enabled. The o suffix enables the 
overflow bit (OV) in the XER. 

addo. Add with Overflow and CR Update. The o. suffix enables 
the update of the condition register and enables the 

overflow bit (OV) in the XER. 





The sum —(rA) + (rB) +1 is placed into rD. 





Subtract rD,rA,rB 
from subf Subtract from 
subf. Subtract from with CR Update. The dot suffix enables the 
update of the condition register. 
subfo Subtract from with Overflow Enabled. The o suffix enables 
the overflow. The o suffix enables the overflow bit (OV) in 
the XER. 
subfo. Subtract from with Overflow and CR Update. The o. suffix 
enables the update of the condition register and enables 
the overflow bit (OV) in the XER. 
Add rD,rA,SIMM | The sum (rA) + SIMMis placed into register rD. 
Immediate 
Carrying 
Add rD,rA,SIMM | The sum (rA) + SIMMis placed into rD. The condition register is 
Immediate updated. 
Carrying 
and Record 


The sum -(rA) + SIMM + 1 is placed into register rD. 


Subtract rD,rA,SIMM 
from 

Immediate 

Carrying 
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Table 3-1. Integer Arithmetic instructions (Continued) 


Operand . 
[tame fmmonc] meme femme 


Add rD,rA,rB The sum (rA) + (rB) is placed into register rD. 
Carrying addc Add Carrying 
addc. Add Carrying with CR Update. The dot suffix enables the 
update of the condition register. 
addco —— Add Carrying with Overflow Enabled. The o suffix enables 
the overflow bit (OV) in the XER. 
addco. Add Carrying with Overflow and CR Update. The o. suffix 
enables the update of the condition register and enables 
















the overflow bit (OV) in the XER. 














Subtract subfe The sum — (fA) + (rB) + 1 is placed into register rD. 
she : serie subfc Subtract from Carrying 
any eutcs subfe. Subtract from Carrying with CR Update. The dot suffix 


enables the update of the condition register. 
subfco Subtract from Carrying with Overflow. The o suffix enables 
the overflow bit (OV) in the XER. 
subfco. Subtract from Carrying with Overflow and CR Update. 
The o. suffix enables the update of the condition register 
and enables the overflow bit (OV) in the XER. 






































Add rD,rA,rB The sum (rA) + (fB) + XER(CA) is placed into register rD. 
Extpnded adde Add Extended 
adde. Add Extended with CR Update. The dot suffix enables the 
update of the condition register. 
addeo Add Extended with Overflow. The o suffix enables the 
overflow bit (OV) in the XER. 
addeo. Add Extended with Overflow and CR Update. The o. suffix 
enables the update of the condition register and enables 
the overflow bit (OV) in the XER. 
Subtract rD,rA,rB The sum =(rA) + (rB) + XER(CA) is placed into register rD. 
from 
Extended subfe Subtract from Extended 


subfe. Subtract from Extended with CR Update. The dot suffix 
enables the update of the condition register. 

subfeo Subtract from Extended with Overflow. The o suffix 
enables the overflow bit (OV) in the XER. 

subfeo. Subtract from Extended with Overflow and CR Update. 
The o. suffix enables the update of the condition register 

and enables the overflow (OV) bit in the XER. 

























Add to addme The sum (rA) + XER(CA) + x‘FFFFFFFF’ is placed into register rD. 


sila ah ce. addme __ Add to Minus One Extended 
adams addme. Add to Minus One Extended with CR Update. The dot 


suffix enables the update of the condition register. 
addmeo_ Add to Minus One Extended with Overflow. The o suffix 
enables the overflow bit (OV) in the XER. 
addmeo. Add to Minus One Extended with Overflow and CR 
Update. The o. suffix enables the update of the condition 
register and enables the overflow (OV) bit in the XER. 
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Table 3-1. Integer Arithmetic Instructions (Continued) 


Operand ; 
ee 


Subtract subfme rD,rA The sum -(rA) + XER(CA) + x'FFFFFFFF'’ is placed into register rD. 
from Minus | subfme. 

One subfmeo 
Extended subfmeo. 






























subfme Subtract from Minus One Extended 
subfme. Subtract from Minus One Extended with CR Update. The 
dot suffix enables the update of the condition register. 
subfmeo Subtract from Minus One Extended with Overflow. The o 
suffix enables the overflow bit (OV) in the XER. , 
subfmeo. Subtract from Minus One Extended with Overflow and CR 
Update. The o. suffix enables the update of the condition 
register and enables the overflow bit (OV) in the XER. 





















Add to Zero 
Extended 






addze 
addze. 
addzeo 
addzeo. 


The sum (rA) + XER(CA) is placed into register rD. 


addze _— Add to Zero Extended 

addze. Add to Zero Extended with CR Update. The dot suffix 
enables the update of the condition register. 

addzeo Add to Zero Extended with Overflow. The o suffix enables 
the overflow bit (OV) in the XER. 

addzeo. Add to Zero Extended with Overflow and CR Update. The 
o. suffix enables the update of the condition register and 

enables the overflow bit (OV) in the XER. 


























subfze 
subfze. 

subfzeo 
subfzeo. 


Subtract 
from Zero 
Extended 


The sum —(rA) + XER(CA) is placed into register rD. 


subfze Subtract from Zero Extended 

subfze. Subtract from Zero Extended with CR Update. The dot 
suffix enables the update of the condition register. 

subfzeo Subtract from Zero Extended with Overflow. The o suffix 
enables the overflow bit (OV) in the XER. 

subfzeo. Subtract from Zero Extended with Overflow and CR 

Update. The o. suffix enables the update of the condition 

register and enables the overflow bit (OV) in the XER. 


The sum (FA) + 1 is placed into register rD. 















neg Negate 

neg. Negate with CR Update. The dot suffix enables the update 
of the condition register. 

nego Negate with Overflow. The o suffix enables the overflow bit 







(OV) in the XER. 
nego. Negate with Overflow and CR Update. The o. suffix 
enables the update of the condition register and enables 
the overflow bit (OV) in the XER. 










Multiply The low-order 32 bits of the 48-bit product (rA)*SIMM are placed into 
Low register rD. The low-order 32 bits of the product are the correct 32-bit 
Immediate product. The low-order bits are independent of whether the operands 


are treated as signed or unsigned integers. However, XER[OV] is set 
based on the result interpreted as a signed integer. 

The high-order bits are lost. This instruction can be used with 
mulhwx to calculate a fuil 64-bit product. 


4 7 
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Table 3-1. Integer Arithmetic Instructions (Continued) 


Operand : 
frame | mene] mime fermen 


Multiply mullw rD,rA,rB The low-order 32 bits of the 64-bit product (rA)*(rB) are placed into 









































Low mullw. register rD. The low-order 32 bits of the product are the correct 32-bit 
mullwo product. The low-order bits are independent of whether the operands 
mullwo. are treated as signed or unsigned integers. However, XER[OV] is set 


based on the result interpreted as a signed integer. 


The high-order bits are lost. This instruction can be used with 

mulhwx to calculate a full 64-bit product. Some implementations may 
execute faster if rB contains the operand having the smaller absolute 
value. 


mullw = Multiply Low 

muliw. Multiply Low with CR Update. The dot suffix enables the 
update of the condition register. 

mullwo Multiply Low with Overflow. The o suffix enables the 
overflow bit (OV) in the XER. 

mullwo. Multiply Low with Overflow and CR Update. The o. suffix 

enables the update of the condition register and enables 

the overflow bit (OV) in the XER. 















Multiply 
High Word 


The contents of rA and FB are interpreted as 32-bit signed integers. 
The 64-bit product is formed. The high-order 32 bits of the 64-bit 
product are placed into rD. 


Both operands and the product are interpreted as signed integers. 


This instruction may execute faster if rB contains the operand having 
the smaller absolute value. 


mulhw = Multiply High Word 
mulhw. Multiply High Word with CR Update. The dot suffix enables 
the update of the condition register. 


Pf 
Pf 







Multiply 
High Word 
Unsigned 


The contents of rA and of rB are extracted and interpreted as 32-bit 
unsigned integers. The 64-bit product is formed. The high-order 32 
bits of the 64-bit product are placed into rD. 


Both operands and the product are interpreted as unsigned integers. 





This instruction may execute faster if rB contains the operand having 
the smaller absolute value. 


mulhwu = Multiply High Word Unsigned 
mulhwu. Multiply High Word Unsigned with CR Update. The dot 
suffix enables the update of the condition register. 
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Table 3-1. Integer Arithmetic Instructions (Continued) 


Operand . 
ed ee 


Divide Word | divw rD,rA,rB The dividend is the signed value of (rA). The divisor is the signed 
value of (rB). The 64-bit quotient is formed. The low-order 32 bits of 
the 64-bit quotient are placed into rD. The remainder is not supplied 
as a result. 

Both operands are interpreted as signed integers. The quotient is the 
unique signed integer that satisfies the following: 






















dividend = (quotient times divisor) +r 
where 0 <r < |divisor| if the dividend is non-negative, and -|divisor| < 
r <0 if the dividend is negative. 

If an attempt is made to perform any of the divisions 





x'8000_0000'/ -1 
or 
<anything> / 0 


the contents of register rD are undefined, as are the contents of the 
LT, GT, and EQ bits of the condition register field CRO if the 
instruction has condition register updating enabled. !n these cases, if 
instruction overflow is enabled, then XER[OV] is set. 

The 32-bit signed remainder of dividing (rA) by (rB) can be computed 
as follows, except in the case that (rA) = -23" and (rB) = -1: 


divw rD,rA,rB_ —srD = quotient 
mull rD,rD,rB. ss rD = quotient*divisor 
subf rD,rD,rA  rD=remainder 
















divw Divide Word 

divw. Divide Word with CR Update. The dot suffix enables the 
update of the condition register. 

divwo Divide Word with Overflow. The o suffix enables the overflow 
bit (OV) in the XER. 

divwo. Divide Word with Overflow and CR Update. The o. suffix 

enables the update of the condition register and enables the 

overflow bit (OV) in the XER. 





MOTOROLA Chapter 3. Addressing Modes and Instruction Set Summary 3-9 


Table 3-1. Integer Arithmetic Instructions (Continued) 


Operand : 
| rome | aemene | mt | craton 


Divide divwu rD,rA,rB The dividend is the value of (rA). The divisor is the value of (rB). The 































Word divwu. 32 bit quotient is placed into rD. The remainder is not supplied as a 
Unsigned divwuo result. 
divwuo. 





Both operands are interpreted as unsigned integers. The quotient is 
the unique unsigned integer that satisfies the following: 


dividend = (quotient times divisor) +r 

where 0 <r < divisor. 

If an attempt is made to perform the division 

<anything> / 0 

the contents of register rD are undefined, as are the contents of the 
LT, GT, and EQ bits of the condition register field CRO if the 


instruction has the condition register updating enabled. In these 
cases, if instruction overflow is enabled, then XER[OV] is set. 


The 32-bit unsigned remainder of dividing (rA) by (rB) can be 
computed as follows: 


divwu rD,rA,rB_—srD = quotient 
mull rD,rD,rB rD = quotient*divisor 
subf rD,rD,rA rD = remainder 

















divwu Divide Word Unsigned 

divwu. Divide Word Unsigned with CR Update. The dot suffix 
enables the update of the condition register. 

divwuo Divide Word Unsigned with Overflow. The o suffix enables 
the overflow bit (OV) in the XER. 

divwuo. Divide Word Unsigned with Overflow and CR Update. The 
o. suffix enables the update of the condition register and 

enables the overflow bit (OV) in the XER. 













Difference This is a POWER instruction, and is not part of the PowerPC 
or Zero architecture. This instruction will not be supported by other 
Immediate PowerPC implementations. 


The sum —(rA) + SIMM + 1 is placed into register rD. If the value in 
register rAis algebraically greater than the value of the SIMM field, 
register rD is cleared. 


y 7 


This instruction is specific to the MPC601. 
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Difference 
or Zero 


Absolute 


MOTOROLA 


Table 3-1. Integer Arithmetic Instructions (Continued) 


Operand 
Syntax 


rD,rA,rB 


Operation 


This is a POWER instruction, and is not part of the PowerPC 
architecture. This instruction will not be supported by other 
PowerPC implementations. 


The sum —(rA) + (rB) + 1 is placed into register rD. If the value in 
register rAis algebraically greater than the value in register rB, 
register rD is cleared. 

if the instruction has condition register updating enabled, condition 
register field CRO is set to reflect the result placed in register rD (i.e., 
if register rD is set to zero, EQ is set to 1). 


If the instruction has overflow enabled, XER[OV] is only set on 
positive overflows. 


doz Difference or Zero 

doz. Difference or Zero with CR Update. The dot suffix enables 
the update of the condition register. 

dozo Difference or Zero with Overflow. The o suffix enables the ' 
overflow bit (OV) in the XER. 

dozo. Difference or Zero with Overflow and CR Update. The o. 
suffix enables the update of the condition register and 
enables the overflow bit (OV) in the XER. 


This instruction is specific to the MPC601. 


This is a POWER instruction, and is not part of the PowerPC 
architecture. This instruction will not be supported by other 
PowerPC implementations. 


The absolute value |(rA)| is placed into register rD. If register rA 
contains the most negative number (i.e., x '80000000'), the result of 
the instruction is the most negative number and sets the XER[OV] bit 
if enabled. 


abs Absolute 

abs. Absolute with CR Update. The dot suffix enables the 
update of the condition register. 

abso Absolute with Overflow. The o suffix enables the overflow 
bit (OV) in the XER 

abso. Absolute with Overflow and CR Update. The o. suffix 
enables the update of the condition register and enables 
the overflow bit (OV) in the XER. 


This instruction is specific to the MPC601. 
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Table 3-1. Integer Arithmetic Instructions (Continued) 


Operand , 
es 


Negative rD,rA This is a POWER instruction, and is not part of the PowerPC 
Absolute architecture. This instruction will not be supported by other 


PowerPC implementations. 
| il 





















The negative absolute value -|(rA)| is placed into register rD. 


Note: nabs never overflows. If the instruction is overflow enabled, 
then XER[OV] is cleared to zero and XER[SO] is not changed. 


nabs Negative Absolute 

nabs. Negative Absolute with CR Update. The dot suffix enables 
the update of the condition register. 

nabso _— Negative Absolute with Overflow. The o suffix enables the 
overflow bit (OV) in the XER 

nabso. Negative Absolute with Overflow and CR Update. The o. 
suffix enables the update of the condition register and 
enables the overflow bit (OV) in the XER. 


This instruction is specific to the MPC601. 



































This is a POWER instruction, and is not part of the PowerPC 
architecture. This instruction will not be supported by other 
PowerPC implementations. 


Bits 0-31 of the product (rA)*(rB) are placed into register rD. Bits 
32-63 of the product (rA)*(rB) are placed into the MQ register. 


If the condition register updating is enabled, then LT, GT, and EQ 
reflect the result in the MQ register’s low-order 32 bits. If the 
instruction is overflow enabled, then the XER[SO] and XER[OV] bits 
are set to one if the product cannot be represented in 32 bits. 


mul Multiply 

mul. Multiply with CR Update. The dot suffix enables the update 
of the condition register. 

mulo Multiply with Overflow. The o suffix enables the overflow 
bit (OV) in the XER. 

mulo. Multiply with Overflow and CR Update. The o. suffix 
enables the update of the condition register and enables 
the overflow bit (OV) in the XER. 


This instruction is specific to the MPC601. 
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Table 3-1. Integer Arithmetic Instructions (Continued) 


Operand : 
ee ee 


Divide rD,rA,rB This is a POWER instruction, and is not part of the PowerPC 
architecture. This instruction will not be supported by other 
PowerPC implementations. 


The quotient {(rA) || (MQ)}/(rB) is placed into register rD. The 
remainder is placed in the MQ register. The remainder has the same 
sign as the dividend, except that a zero quotient or a zero remainder 
is always positive. The results obey the equation: 
































dividend = (divisor * quotient) + remainder 


where dividend is the original (rA) || (MQ), divisor is the original (rB), 
quotient is the final (rD), and remainder is the final (MQ). 


If the condition register updating is enabled, condition register field 
CRO bits LT, GT, and EQ reflect the remainder. If the instruction is 
overflow enabled, then the XER[SO] and XER[OV] bits are set to one 
if the quotient cannot be represented in 32 bits. 


For the case of -2°'/-1, the MQ register is cleared to zero and -2°" is 

placed in register rD. For all other overflows, (MQ), (rD), and 

condition register field CRO (if condition register updating is enabled) 

are undefined. 

div Divide 

div. Divide with CR Update. The dot suffix enables the update 
of the condition register. 

divo Divide with Overflow. The o suffix enables the overflow bit 
(OV) in the XER. 

divo. Divide with Overflow and CR Update. The o. suffix enables 
the update of the condition register and enables the 
overflow bit (OV) in the XER. 


This instruction is specific to the MPC601. 
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Table 3-1. Integer Arithmetic Instructions (Continued) 


Operand : 
ee 


Divide Short | divs rD,rA,rB This is a POWER instruction, and is not part of the PowerPC 
architecture. This instruction will not be supported by other 
PowerPC implementations. 

The quotient (rA)/({rB) is placed into register rD. The remainder is 
placed in MQ. The remainder has the same sign as the dividend, 
except that a zero quotient or a zero remainder is always positive. 
The results obey the equation: 
dividend = (divisor * quotient) + remainder 
where the dividend is the original (rA), divisor is the original (rB), 
quotient is the final (rD), and remainder is the final (MQ). 
If the condition register updating is enabled, then the condition 
register field CRO bits LT, EQ, and GT reflect the remainder. If the 
instruction is overflow enabled, then the XER[SO] and XER[OV] bits 
are set to one if the quotient cannot be represented in 32 bits (e.g., as 
is the case when the divisor is zero, or the dividend is -231 and the 
divisor is -1). For the case of -231/-1, the MQ register is cleared and - 
231 is placed in register rD. For all other overflows, (MQ), (rD), and 
condition register field CRO (if condition register updating is enabled) 
are undefined. 
divs Divide Short 
divs. Divide Short with CR Update. The dot suffix enables the 
update of the condition register. 
divso Divide Short with Overflow. The 0 suffix enables the 






























overflow bit (OV) in the XER. 

divso. _ Divide Short with Overflow and CR Update. The o. suffix 
enables the update of the condition register and enables 
the overflow bit (OV) in the XER. 


This instruction is specific to the MPC601. 







In addition to supporting all of the PowerPC integer arithmetic instructions, the MPC601 
supports the POWER arithmetic instructions summarized in Table 3-1 and Table 3-2 and 
described in detail in Chapter 10, “Instruction Set.” Note that in order to achieve full 
compatibility with future PowerPC implementations, it is up to software to either emulate 
these operations in the program exception handler, or to completely avoid their use. 


Table 3-2. MPC601-Specific Integer Arithmetic Instruction Summary 
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3.3.2 Integer Compare Instructions 

The integer compare instructions algebraically or logically compare the contents of register 
rA with either the UIMM operand, the SIMM operand or the contents of register rB. 
Algebraic comparison compares two signed integers. Logical comparison compares two 
unsigned numbers 


The L field specifies whether the operands are treated as 32- or 64 bit values. The simplified 
mnemonics for integer compare instructions are shown in Table 3-4 correctly clear the L 
value in the instruction rather than requiring it to be coded as a numeric operand. 


Table 3-3. Integer Compare Instructions 


Operand ‘ 
ee 


Compare crfD,L,rA,SIMM | The contents of register rA is compared with the sign-extended 

Immediate value of the SIMM operand, treating the operands as signed 
integers. The result of the comparison is placed into the CR field 
specified by operand crfD. 


Compare erfD,L,rA,rB The contents of register rA is compared with register rB, treating 
the operands as signed integers. The result of the comparison is 
placed into the CR field specified by operand erfD. 


Compare crfD,L,rA,UIMM | The contents of register rA is compared with x'0000’ || UIMM, 
Logical treating the operands as unsigned integers. The result of the 
Immediate comparison is placed into the CR field specified by operand erfD. 


Compare crfD,L,rA,rB The contents of register rA is compared with register rB, treating 
Logical the operands as unsigned integers. The result of the comparison is 
placed into the CR field specified by operand crfD. 


The crfD field can be omitted if the result of the comparison is to be placed in CRO. 
Otherwise the target CR field must be specified in the instruction erfD field, using one of 
the CR field symbols (CRO-CR7) or an explicit field number. 





The instructions listed in Table 3-4 are simplified mnemonics supported in all PowerPC 
implementations that provide compare word capability for 32-bit operands. 


Table 3-4. Word Compare Simplified Mnemonics 


Operation Simplified Mnemonic 
Compare Word Immediate | cmpwi crfD,rA,SIMM cmpi crfD,0,rA,SIMM 
Compare Word cmpw crfD,rA,rB cmp crfD,0,rA,rB 


Compare Logical Word cmp!wi crfD,rA,UIMM cmpli crfD,0,rA,UIMM 
Immediate 


Compare Logical Word cmplw crfD,rA,rB cmpl crfD,0,rA,rB 


The following examples demonstrate the use of the word compare mnemonics as a way to 
simplify instruction coding: 
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1. Compare 32 bits in register rA with immediate value 100 and place result in 
condition register field CRO. 
cmpwi rA,100 (equivalent to cmpi 0,0,rA,100) 

2. Same as (1), but place results in condition register field CR4. 
cmpwicr4,rA,100 (equivalent to cmpi 4,0,rA,100) 

3. Compare registers rA and rB as logical 32-bit quantities and place result in 
condition register field CRO. 
cmplw rA,rB (equivalent to cmpl 0,0,rA,rB) 


3.3.3 Integer Logical Instructions 


The logical instructions shown in Table 3-5 perform bit-parallel operations. Logical 
instructions with the condition register update enabled and instructions andi. and andis. set 
condition register field CRO to characterize the result of the logical operation. These fields 
are set as if the sign-extended low-order 32 bits of the result were algebraically compared 
to zero. Logical instructions without condition register update and the remaining logical 
instructions do not modify the condition register. Logical instructions do not change the 
XER[SO], XER[OV], and XER[CA] bits. 


Table 3-5. Integer Logical Instructions 


Operand : 
j tame fmemonc] rent | opamten | 
AND rA,rS,UIMM | The contents of rS is ANDed with x'0000' || UIMM and the result is 
Immediate placed into rA. 


AND rA,rS,UIMM_ | The contents of rS is ANDed with UIMM || x’0000' and the result is 
Immediate placed into rA. 
Shifted 
OR rA,rS,UIMM | The contents of rS is ORed with x’0000’ || UIMM and the result is 
Immediate placed into rA. 

The preferred no-op is ori 0,0,0 
OR rA,tS,UIMM_ | The contents of rS is ORed with UIMM ||x’0000’ and the result is 
Immediate placed into rA. 
Shifted 


rA,tS,UIMM | The contents of rS is XORed with x'0000’ || UIMM and the result is 
tas placed into rA. 

rA,rS,UIMM | The contents of rS is XORed with U!IMM ||x'0000’ and the result is 
caine placed into rA. 
Shifted 


AND and rAjrS,rB The contents of rS is ANDed with the contents of register rB and the 
and. result is placed into rA. 
and AND 
and. AND with CR Update. The dot suffix enables the update of 
the condition register. 
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Equivalent 


AND with 
Complement 


OR with 
Complement 


Extend Sign 


Byto 


MOTOROLA 


Table 3-5. Integer Logical Instructions (Continued) 


Operand : 
rA,rS,rB The contents of rS is ORed with the contents of rB and the result is 
placed into rA. 
or OR 
or. OR with CR Update. The dot suffix enables the update of the 
condition register. 
rA,rS,rB The contents of rS is XORed with the contents of rB and the result is 
placed into register rA. 
xor XOR 
xor. XOR with CR Update. The dot suffix enables the update of 
the condition register. 
nand rA,rS,rB The contents of rS is ANDed with the contents of rB and the one's 
nand. complement of the result is placed into register rA. 
nand NAND 
nand. NAND with CR Update. The dot suffix enables the update of 
the condition register. 
NAND with rA=rB can be used to obtain the one's complement. 
rArS,rB The contents of rS is ORed with the contents of rB and the one’s 
complement of the result is placed into register rA. 
nor NOR 
nor. NOR with CR Update. The dot suffix enables the update of 
the condition register. 
NOR with rA=rB can be used to obtain the one's complement. 


eqv rA,rS,rB The contents of rS is XORed with the contents of rB and the 
eqv. complemented result is placed into register rA. 
eqv Equivalent 
eqv. Equivalent with CR Update. The dot suffix enables the 
update of the condition register. 
andc rAyrS,rB The contents of rS is ANDed with the complement of the contents of 
andc. rB and the result is placed into rA. 
andc AND with Complement 
ande. AND with Complement with CR Update. The dot suffix 
enables the update of the condition register. 
rA,rS,rB The contents of rS is ORed with the complement of the contents of rB 
and the result is placed into rA. 
orc OR with Complement 
orc. OR with Complement with CR Update. The dot suffix 
enables the update of the condition register. 
rA,rS Register r S[24—31] are placed into rA[24—31]. Bit 24 of rS is placed 
into rA{O-23]. 
extsb Extend Sign Byte 
extsb. Extend Sign Byte with CR Update. The dot suffix enables the 
update of the condition register. 
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Table 3-5. Integer Logical Instructions (Continued) 


Operand : 
j_tame[rmemonc| emt | erm 


Extend Sign extsh rArS Register r S[16-31] are placed into rA[16-31]. Bit 16 of rS is placed 
Half Word extsh. into rA[O—15]. 
extsh Extend Sign Half Word 
extsh. Extend Sign Half Word with CR Update. The dot suffix 
enables the update of the condition register. 





Count rA,rS Acount of the number of consecutive zero bits of rS is placed into rA. 
Leading This number ranges from 0 to 32, inclusive. 
Zeros Word cntlzw Count Leading Zeros Word 
cntlzw. Count Leading Zeros Word with CR Update. The dot suffix 
enables the update of the condition register. 
When the Count Leading Zeros Word instruction has condition 
register updating enabled, the LT field is cleared to zero in CRO. 


3.3.4 Integer Rotate and Shift Instructions 


Rotate and shift instructions provide powerful and general ways to manipulate register 
contents. Simplified mnemonics allow some of the simpler operations to be coded easily. 
Mnemonics are provided for the types of operation shown in Table 3-6. 


Table 3-6. Rotate and Shift Operations 


Description 


Select a field of n bits starting at bit position bin the source register, right or left justify this field in the 
target register, and clear all other bits of the target register to zero. 


Insert Select a left- or right-justified field of n bits in the source register, insert this field starting at bit position 
bof the target register, and leave other bits of the target register unchanged. (No simplified mnemonic 
is provided for insertion of a left-justified field when operating on double-words; such an insertion 
requires more than one instruction.) 


Rotate the contents of a register right or left n bits without masking. 


Shift the contents of a register right or left n bits, clearing vacated bits to 0 (logical shift). 
Clear the leftmost or rightmost n bits of a register to 0. 


Clear left Clear the leftmost b bits of a register, then shift the register left by n bits. This operation can be used to 
and shift scale a known non-negative array index by the width of an element. 
left 





The IU performs rotation operations on data from a GPR and returns the result, or a portion 
of the result, to a GPR. Rotation operations rotate a 32-bit quantity left by a specified 
number of bit positions. Bits that exit from position 0 enter at position 31. 


Rotate and shift instructions employ a mask generator. The mask is 32 bits long and consists 
of 1-bits from a start bit, MB, through and including a stop bit, ME, and 0-bits elsewhere. 
The values of MB and ME range from zero to 31. If MB > ME, the 1-bits wrap around from 
position 31 to position 0. Thus the mask is formed as follows: 
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if MB < ME then 


mask[mstart—mstop] = ones 

mask[all other bits] = zeros 
else 

mask[mstart—31] = ones 

mask[0—mstop] = ones 

mask[all other bits] = zeros 


There is no way to specify an all-zero mask. The use of the mask is described in the 
following sections. 


If condition register updating is enabled, rotate and shift instructions set condition register 
field CRO according to the contents of rA at the completion of the instruction. Rotate and 
shift instructions do not change the values of XER[OV] and XER[SO] bits. Rotate and shift 
instructions, except algebraic right shifts, do not change the XER[CA] bit. 


Simplified mnemonics allow simpler coding of often-used functions such as clearing the 
leftmost or rightmost bits of a register, left justifying or right justifying an arbitrary field, 
and simple rotates and shifts. Some of these are shown as examples with the rotate 
instructions. 


POWER Compatibility Note: In addition to supporting all of the PowerPC integer rotate 
and shift instructions, the MPC6()1 also supports all POWER rotate and shift instructions. 
Note that in order to achieve full compatibility with all POWER applications on future 
PowerPC implementations, it is left up to software to either emulate these operations in the 
instruction exception handler, or to completely avoid their use. These MPC601-specific 
rotate and shift instructions are summarized in Table 3-7. 


Table 3-7. MPC601-Specific Rotate and Shift Instructions 


[eee | nents 
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Table 3-7. MPC601-Specific Rotate and Shift Instructions (Continued) 


eon | ne 


3.3.4.1 Integer Rotate Instructions 

Integer rotate instructions rotate the contents of a register. The result of the rotation is 
inserted into the target register under control of a mask (if a mask bit is 1 the associated bit 
of the rotated data is placed into the target register, and if the mask bit is 0 the associated 
bit in the target register is unchanged), or ANDed with a mask before being placed into the 
target register. 















Rotate left instructions allow right-rotation of the contents of a register to be performed by 
a left-rotation of 32-n, where n is the number of bits by which to rotate right 


3.3.4.2 Integer Shift Instructions 

The instructions in this section perform left and right shifts. Immediate-form logical 
(unsigned) shift operations are obtained by specifying masks and shift values for certain 
rotate instructions. Simplified mnemonics are provided to make coding of such shifts 
simpler and easier to understand. 


Any shift right algebraic instruction, followed by addze, can be used to divide quickly by 
2 


Multiple-precision shifts can be programmed as shown in Appendix E, “Multiple-Precision 
Shifts.” 
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The integer rotate and shift instructions are summarized in Table 3-8. 


Table 3-8. Integer Rotate Instructions 


Rotate Left | rlwinm rA,rS,SH,MB,ME { The contents of register rS are rotated left by the number of bits 
Word riwinm. specified by operand SH. A mask is generated having 1-bits from 
Immediate the bit specified by operand MB through the bit specified by 

then AND operand ME and 0-bits elsewhere. The rotated data is ANDed with 
with Mask the generated mask and the result is placed into register rA. 


tlwinm Rotate Left Word Immediate then AND with Mask 

tlwinm. Rotate Left Word Immediate then AND with Mask with 
CR Update. The dot suffix enables the update of the 
condition register. 

Simplified mnemonics: 

extlwirA,rS,n,b = rlwinm rA,rS,b,0,n-1 

srwi rA,rS,n riwinm rA,rS,32-n,n,31 

clrrwi rA,rS,n rlwinm rA,rS,0,0,31-n 

Note: The rlwinm instruction can be used for extracting, clearing 

and shifting bit fields using the methods shown below: 





To extract an n-bit field that starts at bit position bin register rS, . 
right-justified into rA (clearing the remaining 32-n bits of rA), set 
SH=b+n, MB=32-n, and ME=31. 


To extract an n-bit field that starts at bit position bin rs, left- 
justified into rA, set SH=b, MB = 0, and ME=n-1. 

To rotate the contents of a register left (right) by n bits, set SH=n 
(32-n), MB=0, and ME=31. 

To shift the contents of a register right by n bits, set SH=32-n, 
MBe=n, and ME=31. 

To clear the high-order b bits of a register and then shift the result 
left by n bits, set SH=n, MB=b-n and ME=31-n. 


To clear the low-order n bits of a register, set SH=0, MB=0, and 
ME=31-n. 
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Table 3-8. Integer Rotate Instructions (Continued) 


Rotate Left rA,rS,rB,MB,ME | The contents of rS are rotated left by the number of bits specified 
Word then by rB[27-31]. A mask is generated having 1-bits from the bit 
AND with specified by operand MB through the bit specified by operand ME 
Mask and 0-bits elsewhere. The rotated data is ANDed with the 
generated mask and the result is placed into rA. 


rlwinm Rotate Left Word then AND with Mask ; 

rlwinm. Rotate Left Word then AND with Mask with CR Update. 
The dot suffix enables the update of the condition 
register. 


Simplified mnemonics: 
rotlw rAjrS,rB_os rlwnmrA,rS,rB,0,31 


Note: The rlwinm instruction can be used to extract and rotate bit 
fields using the methods shown below: 







































To extract an n-bit field that starts at the variable bit position bin 
the register specified by operand rS, right-justified into rA (clearing 
the remaining 32-n bits of rA), set r B[27-31]=b+n, MB=32-n, and 
ME=31. 

To extract an n-bit field that starts at variable bit position bin the 
register specified by operand FS, left-justified into rA (clearing the 
remaining 32-n bits of rA), set rB[27-31]=b, MB = 0, and ME=n-1. 


To rotate the contents of the low-order 32 bits of a register left 
(right) by variable n bits, set rB[27—31]=n (82-n), MB=0, and 
ME=31. 

































Rotate Left 
Word 

Immediate 
then Mask 
Insert 


rA,rS,SH,MB,ME | The contents of rS are rotated left by the number of bits specified 
by operand SH. A mask is generated having 1-bits from the bit 
specified by MB through the bit specified by ME and 0-bits 
elsewhere. The rotated data is inserted into rA under control of the 


generated mask. 


rlwimi Rotate Left Word Immediate then Mask 

rlwimi. Rotate Left Word Immediate then Mask Insert with CR 
Update. The dot suffix enables the update of the 
condition register. 


Simplified mnemonic: 
inslwrA,rS,n,b  rlwim rA,rS,32-b,b,b+n-1 


Note: The opcode rlwimi can be used to insert a bit field into the 
contents of register specified by operand rA using the methods 
shown below: 

To insert an n-bit field that is left-justified in rS into rA starting at bit 
position b, set SH=32-b, MB=b, and ME=(b+n)-1. 

To insert an n-bit field that is right-justified in rS into rA starting at 
bit position b, set SH=32-(b+n), MB=b, and ME=(b+n)-1. 
Simplified mnemonics are provided for both of these methods. 
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Table 3-8. Integer Rotate Instructions (Continued) 


Rotate Left rAvS,rB,MB,ME | This is a POWER instruction, and is not part of the PowerPC 
then Mask architecture. This instruction will not be supported by other 
Insert PowerPC implementations. 

























The contents of rS is rotated left the number of positions specified 
by bits 27~31 of rB. The rotated data is inserted into rA under 
control of the generated mask. 


rmi Rotate Left then Mask Insert 
rimi. Rotate Left then Mask Insert with CR Update. The dot 
suffix enables the update of the condition register. 


This instruction is specific to the MPC601. 



































Rotate 
Right and 
Insert Bit 


rA,rS,rB This is a POWER instruction, and is not part of the PowerPC 
architecture. This instruction will not be supported by other 


PowerPC implementations. 


Bit 0 of rS is rotated right the amount specified by bits 27-31 of rB. 
The bit is then inserted into rA. 


rrib Rotate Right and Insert Bit 

rrib. Rotate Right and Insert Bit with CR Update. The dot 
suffix enables the update of the condition register. 

This instruction is specific to the MPC601. 










Mask maskg rA,rS,rB 
Generate maskg. 


This is a POWER instruction, and is not part of the PowerPC 
architecture. This instruction will not be supported by other 
PowerPC implementations. 


Let mstart = rS[27-31], specifying the starting point of a mask of 
ones. Let mstop = rB[27-31], specifying the end point of the mask 
of ones. 






If mstart < mstop+1 then 
MASK(mstart...mstop) = ones 
MASK {all other bits) = zeros 

If mstart = mstop+1 then 
MASK(0-31) = ones 

If mstart > mstop+1 then 
MASK(mstop+1...mstart-1) = zeros 

MASK{(all other bits) = ones 


MASK is then placed in rA. 
maskg Mask Generate 
maskg. Mask Generate with CR Update. The dot suffix 
enables the update of the condition register. 
This instruction is specific to the MPC601. 


































Mask 
Insert from 
Register 


maskir 
maskir. 


rA,rS,rB This is a POWER instruction, and is not part of the PowerPC 
architecture. This instruction will not be supported by other 


PowerPC implementations. 


Q, 
ae 
=F 
Q 
3 
3) 
oO 
en 
5 
iad 3 
w 


Register rS is inserted into TA under contre! 
maskir Mask Insert from Register 
maskir. Mask Insert from Register with CR Update. The dot 
suffix enables the update of the condition register. 
This instruction is specific to the MPC601. 
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The integer shift instructions are summarized in Table 3-9. 


Table 3-9. Integer Shift Instructions 


Operand ‘ 
ee ee 


Shift Left rArS,rB The contents of rS are shifted left the number of bits specified by 
Word — rB[26-31]. Bits shifted out of position 0 are lost. Zeros are supplied to 
the vacated positions on the right. The 32-bit result is placed into rA. 
If rB[26]=1, then rA is filled with zeros. 
siw Shift Left Word 
slw. Shift Left Word with CR Update. The dot suffix enables 
the update of the condition register. 

Shift Right rA,rS,rB The contents of rS are shifted right the number of bits specified by 
Word rB[26—31]. Zeros are supplied to the vacated positions on the left. 
Shift Right srawi rA,rS,SH 

srawi. 


The 32-bit result is placed into rA. 
Algebraic 
sraw rA,rS,rB 
sraw. 














srw. Shift Right Word with CR Update. The dot suffix enables 
the update of the condition register. 










The contents of rS are shifted right the number of bits specified by 
operand SH. Bits shifted out of position 31 are lost. The 32-bit result 
is sign extended and placed into rA. XER[CA] is set if r S contains a 
negative number and any 1-bits are shifted out of position 31; 
otherwise XER(CA) is cleared. An operand SH of zero causes rA to 
be loaded with the contents of rS and XER[CA] to be cleared to 0. 


srawi Shift Right Algebraic Word Immediate 
srawi. Shift Right Algebraic Word Immediate with CR Update. 
The dot suffix enables the update of the condition register. 





If rB[26]=1, then rA is filled with zeros. 
srw Shift Right Word 

Word 

Immediate 























Shift Right 
Algebraic 
Word 


The contents of rS are shifted right the number of bits specified by 
rB[26-31]. The 32-bit result is placed into rA. XER[CA] is set to 1 if rS 
contains a negative number and any 1-bits are shifted out of position 
31; otherwise XER[CA] is cleared to 0. An operand (rB) of zero 
causes fA to be loaded with the contents of rS, and XER[CA] to be 
cleared to 0. If rB[26]=1, then rA is filled with 32 sign bits (bit 0) from 
rs. If rB[26]=0, then rA is filled from the left with sign bits. Condition 
register field CRO is set based on the value written into rA. 


sraw Shift Right Algebraic Word 
sraw. Shift Right Algebraic Word with CR Update. The dot suffix 
enables the update of the condition register. 
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Table 3-9. integer Shift Instructions (Continued) 


Operand : 
frame | memonc| aese | mtn 


Shift Left slq rAyrS,rB This is a POWER instruction, and is not part of the PowerPC 
with MQ siq. architecture. This instruction will not be supported by other 
PowerPC implementations. 
Register rS is rotated left n bits where nis the shift amount specified 
in bits 27-31 of register rB. The rotated word is placed in the MQ 
register. 
When bit 26 of register rB is a zero, a mask of 32-n ones followed by 
n zeros is generated. 
When bit 26 of register rB is a one, a mask of all zeros is generated. 
The logical AND of the rotated word and the generated mask is 
placed into register rA. 
siq Shift Left with MQ 
slq. Shift Left with MQ with CR Update. The dot suffix enables 
the update of the condition register. 
This instruction is specific to the MPC601. 
Shift Right srq rA,rS,rB This is a POWER instruction, and is not part of the PowerPC 
with MQ srq. architecture. This instruction will not be supported by other 
7 ; 






























PowerPC implementations. 





Register rS is rotated left 32—n bits where nis the shift amount 
specified in bits 27-31 of register rB. The rotated word is placed into 
the MQ register. When bit 26 of register rB is a zero, a mask of n 
zeros followed by 32-n ones is generated. 












When bit 26 of register rB is a one, a mask of all zeros is generated. 
The logical AND of the rotated word and the generated mask is 
placed in rA. 


srq Shift Right with MQ 
srq. Shift Right with MQ with CR Update. The dot suffix 
enables the update of the condition register. 


This instruction is specific to the MPC601. 











Shift Left This is a POWER instruction, and is not part of the PowerPC 
Immediate architecture. This instruction will not be supported by other 
with MQ PowerPC implementations. 





Register rS is rotated left nbits where nis the shift amount specified 
by operand SH. The rotated word is placed in the MQ register. A 
mask of 32—n ones followed by n zeros is generated. The logical AND 
of the rotated word and the generated mask is placed into register rA. 


sliq Shift Left Immediate with MQ 
sliq. Shift Left Immediate with MQ with CR Update. The dot 
suffix enables the update of the condition register. 


This instruction is specific to the MPC601. 














MOTOROLA Chapter 3. Addressing Modes and Instruction Set Summary 3-25 


Table 3-9. Integer Shift Instructions (Continued) 


Operand . 
[name |mamenc] meme | opomton 


Shift Right rAs,SH This is a POWER instruction, and is not part of the PowerPC 
Immediate architecture. This instruction will not be supported by other 
with MQ PowerPC implementations. 
Register rS is rotated left 32—n bits where nis the shift amount 
specified by operand SH. The rotated word is placed into the MQ 
register. A mask of n zeros followed by 32—n ones is generated. The 
logical AND of the rotated word and the generated mask is placed in 
register rA. 
sriq Shift Right Immediate with MQ 
sriq. Shift Right Immediate with MQ with CR Update. The dot 


suffix enables the update of the condition register. 
This instruction is specific to the MPC601. 


Shift Left rA,tS,SH This is a POWER instruction, and is not part of the PowerPC 
Long P architecture. This instruction will not be supported by other 
Immediate PowerPC implementations. 
with: Me Register rS is rotated left n bits where nis the shift amount specified 
by SH. Amask of 32—n ones followed by n zeros is generated. The 
rotated word is then merged with the contents of MQ, under control of 
the generated mask. The merged word is placed into rA. The rotated 
word is placed into the MQ register. 
slliq Shift Left Long Immediate with MQ 


slliq. Shift Left Long Immediate with MQ with CR Update. The 
dot suffix enables the update of the condition register. 


This instruction is specific to the MPC601. 


Shift Right rA,rS,SH This is a POWER instruction, and is not part of the PowerPC 

Long 2 architecture. This instruction will not be supported by other 

immediate PowerPC implementations. 

win MG Register rS is rotated left 32—n bits where nis the shift amount 
specified by operand SH. A mask of n zeros followed by 32—n ones is 
generated. The rotated word is then merged with the contents of the 
MQ register, under control of the generated mask. The merged word 
is placed in register rA. The rotated word is placed into the MQ 
register. 
srliq Shift Right Long Immediate with MQ 


stliq. Shift Right Long Immediate with MQ with CR Update. The 
dot suffix enables the update of the condition register. 


This instruction is specific to the MPC601. 
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Table 3-9. Integer Shift Instructions (Continued) 


Operand ; 
jane |atmmenc | Bret | oreo 


Shift Left silq rAtS,rB This is a POWER instruction, and is not part of the PowerPC 
Long with silq. architecture. This instruction will not be supported by other 
MQ PowerPC implementations. 
Register rS is rotated left n bits where nis the shift amount specified 
in bits 27-31 of register rB. 
When bit 26 of register rB is a zero, a mask of 32-n ones followed by 
nzeros is generated. The rotated word is then merged with the 
contents of the MQ register, under control of the generated mask. 
When bit 26 of register rB is a one, a mask of 32-n zeros followed by 
nones is generated. A word of zeros is then merged with the contents 
of the MQ register, under control of the generated mask. 
The merged word is placed in register rA. The MQ register is not 
altered. 
silq Shift Left Long with MQ 
silq. Shift Left Long with MQ with CR Update. The dot suffix 
enables the update of the condition register. 
This instruction is specific to the MPC601. 


Shift Right srlq rA,rS,rB This is a POWER instruction, and is not part of the PowerPC 
Long with sriq. architecture. This instruction will not be supported by other 
sle 
sle. 







































Register rS is rotated left 32-n bits where nis the shift amount 
specified in bits 27-31 of register rB. 


When bit 26 of register rB is a zero, a mask of n Zeros followed by 
32-n ones is generated. The rotated word is then merged with the 
contents of the MQ register, under control of the generated mask. 





When bit 26 of register rB is a one, a mask of n ones followed by 32— 
nzeros is generated. A word of zeros is then merged with the 
contents of the MQ register, under control of the generated mask. 


The merged word is placed in register rA. The MQ register is not 
altered. 


sriq Shift Right Long with MQ 


srlq. Shift Right Long with MQ with CR Update. The dot suffix 
enables the update of the condition register. 


This instruction is specific to the MPC601. 


Shift Left 
Extended 


This is a POWER instruction, and is not part of the PowerPC 
architecture. This instruction will not be supported by other 
PowerPC implementations. 


Register rS is rotated left nbits where nis the shift amount specified 
in bits 27-31 of register rB. The rotated word is placed in the MQ 
register. A mask of 32—n ones followed by n Zeros is generated. 


The logical AND of the rotated word and the generated mask is 
placed in register rA. 


sle Shift Left Extended 
sle. Shift Left Extended with CR Update. The dot suffix 
enables the update of the condition register. 


This instruction is specific to the MPC601. 







MQ PowerPC implementations. 
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Table 3-9. Integer Shift Instructions (Continued) 


Operand F 
[rome | amemonc} am? | mtn 


Shift Right sre rA,rS,rB This is a POWER instruction, and is not part of the PowerPC 
Extended sre, architecture. This instruction will not be supported by other 
PowerPC implementations. 
Register rS is rotated left 32-n bits where nis the shift amount 
specified in bits 27-31 of register rB. The rotated word is placed into 
the MQ register. A mask of n zeros followed by 32—n ones is 
generated. 
The logical AND of the rotated word and the generated mask is 
placed in register rA. 
sre Shift Right Extended 
sre. Shift Right Extended with CR Update. The dot suffix 
enables the update of the condition register. 
This instruction is specific to the MPC601. 
sleq rA,rS,rB 
sleq. 
sreq rA,rS,rB 
sreq. 




























Shift Left This is a POWER instruction, and is not part of the PowerPC 
Extended architecture. This instruction will not be supported by other 
with MQ PowerPC implementations. 





Register rS is rotated left n bits where nis the shift amount specified 
in bits 27-31 of register rB. A mask of 32—n ones followed by n zeros 
is generated. The rotated word is then merged with the contents of 
the MQ register, under control of the generated mask. The merged 
word is placed in register rA. The rotated word is placed in the MQ 
register. . 


sleq Shift Left Extended with MQ 
sleq. Shift Left Extended with MQ with CR Update. The dot 
suffix enables the update of the condition register. 


This instruction is specific to the MPC601. 

















Shift Right 
Extended 
with MQ 


This is a POWER instruction, and is not part of the PowerPC 
architecture. This instruction will not be supported by other 
PowerPC implementations. 






Register rS is rotated left 32—n bits where nis the shift amount 
specified in bits 27-31 of register rB. A mask of n zeros followed by 
32—n ones is generated. The rotated word is then merged with the 
contents of the MQ register, under control of the generated mask. 
The merged word is placed in register rA. The rotated word is placed 
into the MQ register. 


sreq Shift Right Extended with MQ 
sreq. Shift Right Extended with MQ with CR Update. The dot 
suffix enables the update of the condition register. 


This instruction is specific to the MPC601. 
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Table 3-9. Integer Shift Instructions (Continued) 


Operand , : 
jane | aamenc | Britt | mtn 


Shift Right rA,rS,SH This is a POWER instruction, and is not part of the PowerPC 













































Algebraic architecture, This instruction will not be supported by other 
Immediate PowerPC implementations. 
with MQ 


Register rS is rotated left 32—n bits where nis the shift amount 
specified by the operand SH. A mask of n zeros followed by 32-n 
ones is generated. The rotated word is placed in the MQ register. 


The rotated word is then merged with a word of 32 sign bits from 
register rS, under control of the generated mask. The merged word is 
placed in register rA. The rotated word is ANDed with the 
complement of the generated mask. This 32-bit result is ORed 
together and then ANDed with bit 0 of register rS to produce 
XER[CA]. 


Shift Right Algebraic instructions can be used for a fast divide by 2" if 
followed with addze. 


sraiq Shift Right Algebraic Immediate with MQ 
sraiq. Shift Right Algebraic Immediate with MQ with CR Update. 
The dot suffix enables the update of the condition register. 


This instruction is specific to the MPC601. 





























Shift Right This is a POWER instruction, and is not part of the PowerPC 
Algebraic architecture. This instruction will not be supported by other 
with MQ PowerPC implementations. 


Register rS is rotated left 32—n bits where nis the shift amount 
specified in bits 27-31 of register rB. When bit 26 of register rBis a 
zero, a mask of n zeros followed by 32—n ones is generated. When bit 
26 of register rB is a one, a mask of all zeros is generated. The 
rotated word js placed in the MQ register. The rotated word is then 
merged with a word of 32 sign bits from register rS, under control of 
the generated mask. 


The merged word is placed in register rA. 


The rotated word is ANDed with the complement of the generated 
mask. This 32-bit result is ORed together and then ANDed with bit 0 
of register rS to produce XER[CA]. 


Shift Right Algebraic instructions can be used for a fast divide by 2" if 
followed with addze. 


sraq Shift Right Algebraic with MQ 

sraq. Shift Right Algebraic with MQ with CR Update. The dot 
suffix enables the update of the condition register. 

This instruction is specific to the MPC601. 


sraq rA,rS,rB 
sraq. 
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Operand . 
jane J mononc| Smet | mmo 


Shift Right srea rA,rS,rB This is a POWER instruction, and is not part of the PowerPC 
Extended srea. architecture. This instruction will not be supported by other 
Algebraic PowerPC implementations. 
Register rS is rotated left 32—n bits where nis the shift amount 
specified in bits 27-31 of register rB. A mask of n zeros followed by 
32—n ones is generated. The rotated word is placed in the MQ 
register. 
The rotated word is then merged with a word of 32 sign bits from 
register rS, under control of the generated mask. 
The merged word is placed in register rA. 
The rotated word is ANDed with the complement of the generated 
mask. This 32-bit result is ORed together and then ANDed with bit 0 
of register rS to produce XER[CA]. 
srea Shift Right Extended Algebraic 
srea. Shift Right Extended Algebraic with CR Update. The dot 
suffix enables the update of the condition register. 
This instruction is specific to the MPC601. 


Table 3-9. Integer Shift Instructions (Continued) 





























3.4 Floating-Point Instructions 


This section describes the floating-point instructions, which include the following: 


Floating-point arithmetic instructions 

Floating-point multiply-add instructions 
Floating-point rounding and conversion instructions 
Floating-point compare instructions 

Floating-point status and control register instructions 


Floating-point loads and stores are discussed in Section 3.5, “Load and Store Instructions.” 


3.4.1 Floating-Point Arithmetic Instructions 


Single-precision instructions execute faster than their double-precision equivalents in the 
MPC601. For additional details on floating-point performance, refer to Chapter 7, 
“Instruction Timing.” 
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The floating-point arithmetic instructions are summarized in Table 3-10. 


Table 3-10. Floating-Point Arithmetic Instructions 


Operand : 
j_tome ftamonc | Bre | mnt 


Floating- fadd frD,frA,ffrB | The floating-point operand in register frA is added to the floating- 

Point Add fadd. point operand in register frB. If the most significant bit of the resultant 
significand is not a one the result is normalized. The result is 
rounded to the target precision under control of the floating-point 
rounding control field RN of the FPSCR and placed into register frD. 
Floating-point addition is based on exponent comparison and 
addition of the two significands. The exponents of the two operands 
are compared, and the significand accompanying the smaller 
exponent is shifted right, with its exponent increased by one for each 
bit shifted, until the two exponents are equal. The two significands 
are then added algebraically to form an intermediate sum. All 53 bits 
in the significand as well as all three guard bits (G, R, and X) enter 
into the computation. 
lf a carry occurs, the sum's significand is shifted right one bit position 
and the exponent is increased by one. 
FPSCR[FPRF] is set to the class and sign of the result, except for 
invalid operation exceptions when FPSCR[VE]=1. 
fadd Floating-Point Add 
fadd. Floating-Point Add with CR Update. The dot suffix 

enables the update of the condition register. 


Floating- frD,frA,frB | The floating-point operand in register frAis added to the floating- 
Point Add point operand in register frB. If the most significant bit of the resultant 
Single- significand is not a one, the result is normalized. The result is 
Precision rounded to the target precision under control of the floating-point 
rounding control field RN of the FPSCR and placed into register frD. 



















































Floating-point addition is based on exponent comparison and 
addition of the two significands. The exponents of the two operands 
are compared, and the significand accompanying the smaller 
exponent is shifted right, with its exponent increased by one for each 
bit shifted, until the two exponents are equal. The two significands 
are then added algebraically to form an intermediate sum. All 53 bits 
in the significand as well as all three guard bits (G, R, and X) enter 
into the computation. 


if a carry occurs, the sum's significand is shifted right one bit position 
and the exponent is increased by one. 


FPSCR[FPRF] is set to the class and sign of the result, except for 
invalid operation exceptions when FPSCR[VE]=1. 


fadds Floating-Point Single-Precision 
fadds. —_- Floating-Point Single-Precision with CR Update. The dot 
suffix enables the update of the condition register. 
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Table 3-10. Floating-Point Arithmetic Instructions (Continued) 


Operand : 
[ rame [amamenc | mt | even 


Floating- fsub frD,frA,frB | The floating-point operand in register frB is subtracted from the 
Point floating-point operand in register frA. If the most significant bit of the 
Subtract resultant significand is not a 1 the result is normalized. The result is 
rounded to the target precision under control of the floating-point 
rounding control field RN of the FPSCR and placed into register frD. 























The execution of the Floating-Point Subtract instruction is identical to 
that of Floating-Point Add, except that the contents of register frB 
participates in the operation with its sign bit (bit 0) inverted. 


FPSCR[FPRF] is set to the class and sign of the result, except for 
invalid operation exceptions when FPSCR[VE]=1. 




















fsub Floating-Point Subtract 
fsub. Floating-Point Subtract with CR Update. The dot suffix 
enables the update of the condition register. 





























Floating- The floating-point operand in register frB is subtracted from the 


Point floating-point operand in register frA. If the most significant bit of the 
Subtract resultant significand is not a 1 the result is normalized. The result is 
Single- rounded to the target precision under control of the floating-point 


Precision rounding control field RN of the FPSCR and placed into register frD. 


The execution of the Floating-Point Subtract instruction is identical to 
that of Floating-Point Add, except that the contents of register frB 
participates in the operation with its sign bit (bit 0) inverted. 


FPSCR[FPRF] is set to the class and sign of the result, except for 
invalid operation exceptions when FPSCR[VE]=1. 


fsubs Floating-Point Subtract Single-Precision 
fsubs. —_ Floating-Point Subtract Single-Precision with CR Update. 
The dot suffix enables the update of the condition register. 


fsubs frD,frA,frB 
fsubs. 
frD,frA,frc 





Floating- 
Point 
Multiply 


The floating-point operand in register frA is multiplied by the floating- 
point operand in register frC. 


If the most significant bit of the resultant significand is not a one, the 
result is normalized. The result is rounded to the target precision 
under control of the floating-point rounding control field RN of the 
FPSCR and placed into register frD. 


Floating-point multiplication is based on exponent addition and 
multiplication of the significands. 


FPSCR[FPRF] is set to the class and sign of the result, except for 
invalid operation exceptions when FPSCR[VE]=1. 





fmul Floating-Point Multiply 
fmul. Floating-Point Multiply with CR Update. The dot suffix 
enables the update of the condition register. 
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Table 3-10. Floating-Point Arithmetic Instructions (Continued) 


Operand ‘ 
[tone | umenone | Smet | croton 


Floating- frD,frA,frC | The floating-point operand in register frAis multiplied by the floating- 
Point point operand in register frC. 
esa If the most significant bit of the resultant significand is not a one the 
p bee result is normalized. The result is rounded to the target precision 
te under contro! of the floating-point rounding control field RN of the 
FPSCR and placed into register frD. 
Floating-point multiplication is based on exponent addition and 
multiplication of the significands. 
FPSCR[FPRF] is set to the class and sign of the result, except for 
invalid operation exceptions when FPSCR[VE]=1. 
fmuls Floating-Point Multiply Single-Precision 
fmuls. — Floating-Point Multiply Single-Precision with CR Update. 
The dot suffix enables the update of the condition register. 
Floating- frD,frA,frB | The floating-point operand in register frAis divided by the floating- 
Point Divide point operand in register frB. No remainder is preserved. 
fdivs frD,frA,frB 
tdivs. 








































If the most significant bit of the resultant significand is not a 1, the 
result is normalized. The result is rounded to the target precision 
under control of the floating-point rounding control field RN of the 
FPSCR and placed into register frD. 


Floating-point division is based on exponent subtraction and division 
of the significands. 


FPSCRI[FPRF] is set to the class and sign of the result, except for 
invalid operation exceptions when FPSCR[VE]=1 and Zero divide 
exceptions when FPSCR[ZE]=1. 


fdiv Floating-Point Divide 
fdiv. Floating-Point Divide with CR Update. The dot suffix 
enables the update of the condition register. 








Floating- 
Point 
Divide 
Single- 
Precision 


The floating-point operand in register frA is divided by the floating- 
point operand in register frB. No remainder is preserved. 


If the most significant bit of the resultant significand is not a 1, the 
result is normalized. The result is rounded to the target precision 
under control of the floating-point rounding control field RN of the 
FPSCR and placed into register frD. 


Floating-point division is based on exponent subtraction and division 
of the significands. 


FPSCR[FPRF] is set to the class and sign of the result, except for 
invalid operation exceptions when FPSCR[VE]=1 and zero divide 
exceptions when FPSCR[ZE]=1. 


fdivs Floating-Point Divide Single-Precision 
fdivs. Floating-Point Divide Single-Precision with CR Update. 
The act suitix snasiss the update of the condition register. 
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3.4.2 Floating-Point Multiply-Add Instructions 


These instructions combine multiply and add operations without an intermediate rounding 
operation. The fractional part of the intermediate product is 106 bits wide, and all 106 bits 
take part in the add/subtract portion of the instruction. 


The floating-point multiply-add instructions are summarized in Table 3-11. 


Table 3-11. Floating-Point Multiply-Add Instructions 


Operand . 
| tone | memonc| Smt | pmo 


Floating- 
Point 
Multiply- 
Add 


Floating- fmadds 


Point fmadds. 


Multiply- 
Add 
Single- 
Precision 


Floating- 
Point 

Multiply- 
Subtract 
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frD,frA,frC,frB 


frD,frA,frC,frB 


frD,frA,frC,frB 
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The floating-point operand in register frAis multiplied by the floating- 
point operand in register frC. The floating-point operand in register 
frB is added to this intermediate result. 


If the most significant bit of the resultant significand is not a one the 
result is normalized. The result is rounded to the target precision 
under control of the floating-point rounding control field RN of the 
FPSCR and placed into register frD. 


FPSCR[FPRF] is set to the class and sign of the result, except for 
invalid operation exceptions when FPSCR[VE]=1. 


fmadd Floating-Point Multiply-Add 
fmadd. Floating-Point Multiply-Add with CR Update. The dot 
suffix enables the update of the condition register. 


The floating-point operand in register frA is multiplied by the floating- 
point operand in register frC. The floating-point operand in register 
frB is added to this intermediate result. 


If the most significant bit of the resultant significand is not a one the 
result is normalized. The result is rounded to the target precision 
under control of the floating-point rounding control field RN of the 
FPSCR and placed into register frD. 


FPSCRI[FPRF] is set to the class and sign of the result, except for 
invalid operation exceptions when FPSCR[VE]=1. 


fmadds Floating-Point Multiply-Add Single-Precision 

fmadds. Floating-Point Multiply-Add Single-Precision with CR 
Update. The dot suffix enables the update of the 
condition register. 


The floating-point operand in register frA is multiplied by the floating- 
point operand in register frC. The floating-point operand in register 
frB is subtracted from this intermediate result. 


If the most significant bit of the resultant significand is not a one the 
result is normalized. The result is rounded to the target precision 
under control of the floating-point rounding control field RN of the 
FPSCR and placed into register frD. 


FPSCR[FPAF] is set to the class and sign of the result, except for 
invalid operation exceptions when FPSCRI[VE]=1. 


Floating-Point Multiply-Subtract 
Floating-Point Multiply-Subtract with CR Update. The dot 
suffix enables the update of the condition register. 


fmsub 
fmsub. 
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Table 3-11. Floating-Point Multiply-Add Instructions (Continued) 


Operand , 
jane mmene | rem | mtn 


Floating- fmsubs frD,frA,frC,frB | The floating-point operand in register frAis multiplied by the floating- 
Point fmsubs. point operand in register frC. The floating-point operand in register 
Multiply- frB is subtracted from this intermediate result. 
Subtract 
Single- 
Precision 








































If the most significant bit of the resultant significand is not a one the 
result is normalized. The result is rounded to the target precision 
under control of the floating-point rounding control field RN of the 
FPSCR and placed into register frD. 


FPSCR[FPRF] is set to the class and sign of the result, except for 
invalid operation exceptions when FPSCR[VE]=1. 


fmsubs Floating-Point Multiply-Subtract Single-Precision 
fmsubs. Floating-Point Multiply-Subtract Single-Precision with CR 
Update. The dot suffix enables the update of the 
condition register. 


































fnmadd 
fnmadd. 


Floating- 
Point 
Negative 
Multiply- 
Add 


frD,frA,frC,frB | The floating-point operand in register frAis multiplied by the floating- 
point operand in register frC. The floating-point operand in register 


frB is added to this intermediate result. 





If the most significant bit of the resultant significand is not a one the 
result is normalized. The result is rounded to the target precision 
under control of the floating-point rounding control field RN of the 
FPSCR, then negated and placed into register frD. 


This instruction produces the same result as would be obtained by 
using the floating-point multiply-add instruction and then negating the 
result, with the following exceptions: 


¢ QNaNs propagate with no effect on their sign bit. 

¢ QNaNs that are generated as the result of a disabled invalid 
operation exception have a "sign" bit of zero. 

¢ SNaNs that are converted to QNaNs as the result of a disabled 
invalid operation exception retain the "sign" bit of the SNaN. 

FPSCR[FPRAF] is set to the class and sign of the result, except for 

invalid operation exceptions when FPSCR[VE] - 4, 


fnmadd Floating-Point Negative Multiply-Add 
fnmadd. Floating-Point Negative Multiply-Add with CR Update. 
The dot suffix enables the update of the condition register. 
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Table 3-11. Floating-Point Multiply-Add Instructions (Continued) 


Floating- fnmadds frD,frA,frC,frB | The floating-point operand in register frAis multiplied by the floating- 
Point fnmadds. point operand in register frC. The floating-point operand in register 
Negative frB is added to this intermediate result. 

Multiply- 
Add 
Single- 
Precision 


If the most significant bit of the resultant significand is not a one the 
result is normalized. The result is rounded to the target precision 
under control of the floating-point rounding control field RN of the 
FPSCR, then negated and placed into register frD. 


This instruction produces the same result as would be obtained by 
using the floating-point multiply-add instruction and then negating the 
result, with the following exceptions: 


QNaNs propagate with no effect on their sign bit. 

QNaNs that are generated as the result of a disabled invalid 
operation exception have a "sign" bit of zero. 

SNaNs that are converted to QNaNs as the result of a disabled 
invalid operation exception retain the “sign” bit of the SNaN. 


FPSCR[FPRAF] is set to the class and sign of the result, except for 
invalid operation exceptions when FPSCR[VE] - 1, 


fnmadds Fioating-Point Negative Multiply-Add Single-Precision 
fnmadds. Floating-Point Negative Multiply-Add Single-Precision with 


CR Update. The dot suffix enables the update of the 
condition register. 


Floating- fnmsub frD,frA,frC,frB | The floating-point operand in register frA is multiplied by the floating- 
Point fnmsub. point operand in register frC. The floating-point operand in register 
Negative frB is subtracted from this intermediate result. 

Multiply- 


If the most significant bit of the resultant significand is not a one the 
Subtract 


result is normalized. The result is rounded to the target precision 
under control of the floating-point rounding control field RN of the 
FPSCR, then negated and placed into register frD. 


This instruction produces the same result as would be obtained by 
using the floating-point multiply-subtract instruction and then negating 
the result, with the following exceptions: 


QNaNs propagate with no effect on their sign bit. 

QNaNSs that are generated as the result of a disabled invalid 

operation exception have a sign bit of zero. 

SNaNs that are converted to QNaNs as the result of a disabled 

invalid operation exception retain the sign bit of the SNaN. 
FPSCR[FPRF] is set to the class and sign of the result, except for 
invalid operation exceptions when FPSCR[VE]=1. 


fnmsub Floating-Point Negative Multiply-Subtract 
fnmsub. Floating-Point Negative Multiply-Subtract with CR Update. 
The dot suffix enables the update of the condition register. 
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Table 3-11. Floating-Point Multiply-Add Instructions (Continued) 


Floating- fnmsubs 
Point fnmsubs. point operand in register frC. The floating-point operand in register 
Negative frB is subtracted from this intermediate result. 


ee if the most significant bit of the resultant significand is not a one the 

Sinal result is normalized. The result is rounded to the target precision 

eee under control of the floating-point rounding control field RN of the 
FPSCR, then negated and placed into register frD. 


This instruction produces the same result as would be obtained by 
using the floating-point multiply-subtract instruction and then negating 
the result, with the following exceptions: 


QNaNs propagate with no effect on their “sign” bit. 
QNaNs that are generated as the result of a disabled invalid 
operation exception have a “sign” bit of zero. 

SNaNs that are converted to QNaNs as the result of a disabled 
invalid operation exception retain the “sign” bit of the SNaN. 
FPSCR[FPRF] is set to the class and sign of the result, except for 

invalid operation exceptions when FPSCR[VE]=1. 


fnmsubs Floating-Point Negative Multiply-Subtract Single-Precision 

fnmsubs. Floating-Point Negative Multiply-Subtract Single- 
Precision with CR Update. The dot suffix enables the 
update of the condition register. 





3.4.3 Floating-Point Rounding and Conversion Instructions 


The floating-point rounding instruction is used to produce a 32-bit single-precision number 
from a 64-bit double-precision floating-point number. The floating-point convert 
instructions convert 64-bit double-precision floating point numbers to 32-bit signed integer 
numbers. 


On Floating-Point Convert to Integer Word (fctiw) and Floating-Point Convert to Integer 
Word with Round toward Zero (fctiwz), the PowerPC architecture defines bits 0-31 of 
floating-point register frD as undefined. In the MPC601, these bits take on the value 
x'FFF8 0000' (which is the representation for a QNaN). This value may differ in future 
PowerPC processors, and software should avoid dependence on this MPC601 feature. 


The floating-point rounding instructions are shown in Table 3-12. 
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Examples of uses of these instructions to perform various conversions can be found 1 in 
Appendix F, “Floating-Point Models.” 


Table 3-12. Floating-Point Rounding and Conversion Instructions 


Operand ; 
j_tene | umenone | rent) oven 


Floating- If it is already in single-precision range, the floating-point operand in 
Point . register frB is placed into register frD. Otherwise the floating-point 
Round to operand in register frB is rounded to single-precision using the 
Single- rounding mode specified by FPSCR[RN] and placed into register frD. 


Precision The rounding is described fully in Appendix F, “Floating-Point 


Models.” 


FPSCR[FPRF] is set to the class and sign of the result, except for 
invalid operation exceptions when FPSCR[VE]=1. 


frsp Floating-Point Round to Single-Precision 

frsp. Floating-Point Round to Single-Precision with CR 
Update. The dot suffix enables the update of the 
condition register. 


Floating- The floating-point operand in register frB is converted to a 32-bit 
Point signed integer, using the rounding mode specified by FPSCA[RN], 
Convert to and placed in bits 32-63 of register frD. Bits 0-31 of register frD are 
Integer undefined. 


Word If the operand in register frB is greater than 2°! - 1, bits 32-63 of 


register frD are set to x ‘7FFF_FFFF'. 


If the operand in register frB is less than -29', bits 32~63 of register 
frD are set to x ‘8000_0000'. 


The conversion is described fully in Appendix F, “Floating-Point 
Models.” 


Except for trap-enabled invalid operation exceptions, FPSCR[FPRF] 
is undefined. FPSCR[FR] is set if the result is incremented when 
rounded. FPSCRI[F] is set if the result is inexact. 


fctiw Floating-Point Convert to Integer Word 
fctiw. Floating-Point Convert to Integer Word with CR Update. 
The dot suffix enables the update of the condition register. 
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Table 3-12. Floating-Point Rounding and Conversion Instructions (Continued) 


Floating- The floating-point operand in register frB is converted to a 32-bit 
Point signed integer, using the rounding mode Round toward Zero, and 

Convert to placed in bits 32-63 of register frD. Bits 0-31 of register frD are 

Integer undefined. 

Word with 


Sauna If the operand in frB is greater than 2°" - 1, bits 32-63 of frD are set 


to x ‘7FFF_FFFF'. 


if the operand in register frB is less than -2°', bits 32-63 of register 
frD are set to x ‘8000_0000'. 


The conversion is described fully in Appendix F, “Floating-Point 
Models.” 


Except for trap-enabled invalid operation exceptions, FPSCR[FPRF] 
is undefined. FPSCR[FR] is set if the result is incremented when 
rounded. FPSCR[F] is set if the result is inexact. 


fctiwz Floating-Point Convert to Integer Word with Round Toward 
Zero 

fctiwz. —_ Floating-Point Convert to Integer Word with Round Toward 
Zero with CR Update. The dot suffix enables the update 
of the condition register. 





3.4.4 Floating-Point Compare Instructions 


Floating-point compare instructions compares the contents of two floating-point registers 
and the comparison ignores the sign of zero (that is +0 = -0). The comparison can be 
ordered or unordered. The comparison sets one bit in the designated CR field and clears the 
other three bits. The FPCC (floating-point condition code; bits 16-19 in the floating-point 
status and control register) is set in the same way. 


The CR field and the FPCC are interpreted as shown in Table 3-13. 
Table 3-13. CR Bit Settings 


[a [vee [omen 
[eR _[waewe 
Fe 
[= [w= 
fs [ [wa 7 roe 


On Floating-Point Compare Unordered (fempu) and Floating-Point Compare Ordered 
(fciipo) instructions with condition register updating enabled, the PowerPC architecture 
defines CR1 and the CR field specified by operand erfD as undefined. 













The floating-point compare instructions are summarized in Table 3-14. 
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Table 3-14. Floating-Point Compare Instructions 


Operand . 
frame | amamenc | Bammer 


Floating- crfD,frA,frB | The floating-point operand in register frAis compared to the floating- 

Point point operand in register frB. The result of the compare is placed into 

Compare CR field crfD and the FPCC. 

Unordered If an operand is a NaN, either quiet or signalling, CR field crfD and 
the FPCC are set to reflect unordered. If an operand is a Signalling 
NaN, VXSNAN is set. 

Floating- a crfD,frA,frB | The floating-point operand in register frAis compared to the floating- 


Point point operand in register frB. The result of the compare is placed into 
Compare CR field crfD and the FPCC. 
Ordered If an operand is a NaN, either quiet or signalling, CR field crfD and 
the FPCC are set to reflect unordered. If an operand is a Signalling 
NaN, VXSNAN is set, and if invalid operation is disabled (VE=0) then 
VXVC is set. Otherwise, if an operand is a Quiet NaN, VXVC is set. 
3.4.5 Floating-Point Status and Control Register Instructions 
Every FPSCR instruction appears to synchronize the effects of all floating-point 
instructions executed by a given processor. Executing an FPSCR instruction ensures that 
all floating-point instructions previously initiated by the given processor appear to have 
completed before the FPSCR instruction is initiated and that no subsequent floating-point 
instructions appear to be initiated by the given processor until the FPSCR instruction has 
completed. In particular: 




















¢ All exceptions caused by the previously initiated instructions are recorded in the 
FPSCR before the FPSCR instruction is initiated. 


¢ All invocations of the floating-point exception handler that caused by the previously 
initiated instructions have occurred before the FPSCR instruction is initiated. 


¢ Nosubsequent floating-point instruction that depends on or alters the settings of any 
FPSCR bits appears to be initiated until the FPSCR instruction has completed. 


Floating-point memory access instructions are not affected. 


The floating-point status and control register instructions are summarized in Table 3-15. 
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Table 3-15. Floating-Point Status and Control Register Instructions 


Operand : 
frame [menonc| Somat fate 


Move from The contents of the FPSCR are placed into bits 32-63 of register frD. 
In the MPC601, bits 0-31 of floating-point register frD are set to the 
value x 'FFFF_FFFF'. 


mffs Move from FPSCR 
mffs. Move from FPSCR with CR Update. The dot suffix 
enables the update of the condition register. 












The contents of FPSCR field specified by operand crfS are copied to 
the CR field specified by operand crfD. All exception bits copied are 
cleared to zero in the FPSCR. 
















Condition 
Register 
from FPSCR 


The value of the IMM field is placed into FPSCR field crfD. All other 
FPSCR fields are unchanged. 


mtfsfi Move to FPSCR Field Immediate 
mtfsfi. | Move to FPSCR Field Immediate with CR Update. The 
dot suffix enables the update of the condition register. 

When FPSCR[0-3] is specified, bits 0 (FX) and 3 (OX) are set to the 
values of IMM[O] and IMM{[3] (i.e., even if this instruction causes OX 
to change from 0 to 1, FX is set from IMM{[0] and not by the usual rule 
that FX is set to 1 when an exception bit changes from 0 to 1). Bits 1 
and 2 (FEX and VX) are set according to the usual rule described in 
2.2.3, “Floating-Point Status and Control Register (FPSCR),” and not 
from IMM[1—2]. 
























FPSCR 
Field 
Immediate 
















Bits 32-63 of register frB are placed into the FPSCR under control of 
the field mask specified by FM. The field mask identifies the 4-bit 
fields affected. Let /be an integer in the range 0-7. If FM=1 then 
FPSCR field i (FPSCR bits 4*/ through 4*/+3) is set to the contents 
of the corresponding field of the low-order 32 bits of register frB. 


mtfsf Move to FPSCR Fields 

mtfsf. Move to FPSCR Fields with CR Update. The dot suffix 
enables the update of the condition register. 

In other PowerPC implementations, the mtfsf instruction may 

perform more slowly when only a portion of the fields are updated. 

This is not the case in the MPC601. 


When FPSCR(0-3] is specified, bits 0 (FX) and 3 (OX) are set to the 
values of frB[32] and frB[35] (i.e., even if this instruction causes OX 
to change from 0 to 1, FX is set from frB[32] and not by the usual rule 
that FX is set to 1 when an exception bit changes from 0 to 1). Bits 1 
and 2 (FEX and VX) are set according to the usual rule described in 

2.2.3, “Floating-Point Status and Control Register (FPSCR),” and not 
from frB[33-—34]. 







FPSCR 
Fields 




















on ££ nH 
tilkiouy 


mtfsbo. 





FPSCR Bit 0 






mtfsb0 Move to FPSCR Bit 0 
mtfsb0. Move to FPSCR Bit 0 with CR Update. The dot suffix 
enables the update of the condition register. 
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Table 3-15. Floating-Point Status and Control Register Instructions (Continued) 


Operand ‘ 
jinenone| emt forms | 


The bit of the FPSCR specified by operand crbD is set to 1. 
Bits 1 and 2 (FEX and VX) cannot be reset explicitly. 


mtfsb1 Move to FPSCR Bit 1 
mtfsb1. Move to FPSCR Bit 1 with CR Update. The dot suffix 
enables the update of the condition register. 









Move to 
FPSCR Bit 1 








3.5 Load and Store Instructions 


This section describes the load and store instructions of the MPC601, which consist of the 
following: 

¢ Integer load instructions 

¢ Integer store instructions 

e Integer load and store with byte reversal instructions 

e Integer load and store multiple instructions 

e Floating-point load instructions 

¢ Floating-point store instructions 

¢ Floating-point move instructions 

e¢ Memory synchronization instructions 


3.5.1 Integer Load and Store Address Generation 


Integer load and store operations generate effective addresses using register indirect with 
immediate index mode, register indirect with index mode or register indirect mode. 


3.5.1.1 Register Indirect with Immediate Index Addressing 

Instructions using this addressing mode contain a signed 16-bit immediate index (d 
operand) which is sign extended to 32 bits, and added to the contents of a general purpose 
register specified in the instruction (rA operand) to generate the effective address. A zero 
in place of the rA operand causes a zero to be added to the immediate index (d operand). 
The option to specify rA or 0 is shown in the instruction descriptions as (rAlOQ). 


Figure 3-1 shows how an effective address is generated when using register indirect with 
immediate index addressing. 
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67 1112 16 17 


comm fore] a [a 


Instruction encode 

















GPR 





(rA) 


Effective Address 


Memory 


Figure 3-1. Register Indirect with Immediate Index Addressing 


GPR (rD/rS) 


3.5.1.2 Register Indirect with Index Addressing 


Instructions using this addressing mode cause the contents of two general purpose registers 
(specified as operands rA and rB) to be added in the generation of the effective address. A 
zero in place of the rA operand causes a zero to be added to the contents of the general 
purpose register specified in operand rB. The option to specify rA or () is shown in the 
instruction descriptions as (rAlQ). 


Figure 3-2 shows how an effective address is generated when using register indirect with 
index addressing. 


1112 1617 21 22 30 31 


Instruction Encoding: Tora [ove] ma | w | SoD 





31 





GPR (rB) 


No +) 








0 31 
GPR (7A Effective Address 
0 31 
Store Memory 
GPR (rD/rS) Access 





Figure 3-2. Register Indirect with Index Addressing 
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3.5.1.3 Register Indirect Addressing 

Instructions using this addressing mode use the contents of the general purpose register 
specified by the rA operand as the effective address. A zero in the rA operand causes an 
effective address of zero to be generated. The option to specify rA or 0 is shown in the 
instruction descriptions as (rAl0). 


Figure 3-3 shows how an effective address is generated when using register indirect 
addressing. 


67 1112 1617 21 22 30 31 


(Sree [ore] cm [ne | sini 










GPR (rA) 


31 


Effective Address 
Memory 


Figure 3-3. Register Indirect Addressing 





GPR (rD/rs) 





3.5.2 Integer Load Instructions 


For load instructions, the byte, half-word, word, or double-word addressed by EA is loaded 
into rD, Many integer load instructions have an update form, in which rA is updated with 
the generated effective address. For these forms, if rA # 0 and rA # rD, the effective 
address is placed into rA and the memory element (byte, half-word, or word) addressed by 
EA is loaded into rD. 


Note that non-MPC601 implementations of the architecture may run the load half algebraic 


instructions (Iha, Ihax) and the load with update (Ibzu, Ibzux, Ihzu, Ihzux, Ihau, Ihaux) 
instructions with greater latency than other types of load instructions. In the MPC601, all 
of these instructions operate with the same latency as other load instructions. For details on 
instruction timing, see Chapter 7, “Instruction Timing.” 
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The PowerPC architecture defines load with update instructions with rA=0 or rA=rD as an 
invalid form. In the POWER architecture, these forms are not considered invalid and 
specifications exist for these cases. To maintain compatibility with the POWER 
architecture, for the case where rA=0, the MPC601 does not update r0. In cases where 
rA=rD, the load data is loaded into rD and the register rA update is suppressed. In addition, 
the PowerPC architecture defines integer load instructions with the condition register 
update option enabled to be an invalid form and the POWER architecture does not. For 
compatibility, the MPC601 executes the instruction in a manner consistent with the 
PowerPC architecture and it causes an undefined value to be placed into the condition 
register CRO field. 


Table 3-16 summarizes the load instructions available for the MPC601. 


Table 3-16 Integer Load Instructions 
Operand . 
jteme | monte | rent | pete 
Load Byte rD,d(rA) The effective address is the sum (rA|0)+d. The byte in memory 
and Zero addressed by the EA is loaded into register rD[24—31]. The remaining 
bits in register rD are cleared to 0. 


Load Byte rD,rA,rB The effective address is the sum (rA|0)+(rB). The byte in memory 


and Zero addressed by the EA is loaded into register rD[24—31]. The remaining 
Indexed bits in register rD are cleared to 0. 

i - 
iv 


rD,d(rA) 
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The effective address (EA) is the sum (rA|0)+d. The byte in memory 
addressed by the EA is loaded into register rD[24—31]. The remaining 
bits in register rD are cleared to 0. The EAis placed into register rA. If 
operand rA=0 the MPC601 does not update 10, or if rA=rD the load 
data is loaded into register rD and the register update is suppressed. 
Although the PowerPC architecture defines load with update 
instructions with operand rA=0 or rA=rD as invalid forms, the 
MPC6E01 allows these cases. 


The effective address (EA)is the sum (rA]0)+(rB). The byte 
addressed by the EA is loaded into register rD[24—31]. The remaining 
bits in register rD are cleared to 0. The EAis placed into register rA. If 
operand rA=0 the MPC601 does not update register r0, or if rA=rD 
the load data is loaded into register rD and the register update is 
suppressed. Although the PowerPC architecture defines load with 
update instructions with operand rA=0 or rA=rD as invalid forms, the 
MPC601 allows these cases. 


Load Byte 
and Zero 
with Update 
















Load Byte 
and Zero 
with 
Update 
Indexed 

























The effective address is the sum (rA|0)+d. The half-word in memory 
addressed by the EA is loaded into register rD[16-31]. The remaining 
bits in rD are cleared to 0. 


Load 
Half Word 
and Zero 


Load 
Half Word 
and Zero 
Indexed 














The effective address is the sum (rA]0)+(rB). The half-word in 
memory addressed by the EA is loaded into register rD[16-31]. The 
remaining bits in register rD are cleared. 









Load rD,rA,rB The effective address is the sum (rA|0)+(rB). The half-word in 
Half Word memory addressed by the EA is loaded into register rD[16-31]. The 
and Zero remaining bits in register rD are cleared. The EA is placed into 
with register rA. Although the PowerPC architecture defines load with 
Update update instructions with operand rA=0 or rA=rD as invalid forms, the 
Indexed MPC601 allows these cases. 
Load rD,d(rA) The effective address is the sum (rA)+d. The half-word in memory 
Half Word addressed by the EAis loaded into register rD[16-31]. The remaining 
Algebraic bits in register rD are filled with a copy of bit 0 of the loaded half-word. 
Load rD,rA,rB The effective address is the sum (rA]0)+(rB). The half-word in 
Half Word memory addressed by the EA is loaded into register rD[16-31]. The 
Algebraic remaining bits in register rD are filled with a copy of bit 0 of the loaded 
Indexed half-word. 
Load rD,d(rA) The effective address is the sum (rA|0)+d. The half-word in memory 
Half Word addressed by the EAis loaded into register rD[16-31]. The remaining 
Algebraic bits in register rD are filled with a copy of bit 0 of the loaded half-word. 
with Update The EA is placed into register rA. If operand rA=0 the MPC601 does 
not update register r0, or if rA=rD the load data is loaded into register 
rD and the register update is suppressed. Although the PowerPC 
architecture defines load with update instructions with operand rA=0 
or rA=rD as invalid forms, the MPC601 allows these cases. 
rD,rA,rB The effective address is the sum (rA]0)+(rB). The half-word in 
Half Word memory addressed by the EA is loaded into register rD[16-31]. The 
Algebraic remaining bits in register rD are filled with a copy of bit 0 of the loaded 
with half-word. The EAis placed into register rA. If operand rA=0 the 
Update MPCE01 does not update r0, or if rA=rD the load data is loaded into 
Indexed register rD and the register update is suppressed. Although the 
PowerPC architecture defines load with update instructions with 
operand rA=0 or rA=rD as invalid forms, the MPC601 allows these 
cases. 
‘| Load Word rD,d(rA) The effective address is the sum (rA]0)+d. The word in memory 
and Zero addressed by the EA is loaded into register rD[0-31]. 
Load Word rD,rA,rB The effective address is the sum (rA|0)+(rB). The word in memory 
and Zero addressed by the EAis loaded into register rD[0-31]. 
Indexed 


Table 3-16 Integer Load Instructions (Continued) 


Operand : 
[ene | venoms | amme f atom 


Load rD,d(rA) The effective address is the sum (rA]0)+d. The half-word in memory 
Half Word addressed by the EAis loaded into register rD[16—31]. The remaining 
and Zero bits in register rD are cleared. 
with Update The EAis placed into register rA. 
lf operand rA=0 the MPC601 does not update register r0, or if rA=rD 
the load data is loaded into register rD and the register update is 
suppressed, Although the PowerPC architecture defines load with 
update instructions with operand rA=0 or rA=rD as invalid forms, the 
MPC601 allows these cases. 
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Table 3-16 Integer Load Instructions (Continued) 


Operand ; 
ee ae 


Load Word rD,d(rA) The effective address is the sum (rA|0)+d. The word in memory 

and Zero addressed by the EA is loaded into register rD[0O—31]. The EAis 

with Update placed into register rA. If operand rA=0 the MPC601 does not update 
register r0, or if rA=rD the load data is loaded into register rD and the 
register update is suppressed. Although the PowerPC architecture 
defines load with update instructions with operand rA=0 or rA=rD as 
invalid forms, the MPC601 allows these cases. 























Load Word rD,rA,rB The effective address is the sum (rA|0)+(rB). The word in memory 

and Zero addressed by the EA is loaded into register rD[0-31]. The EAis 

with placed into register rA. If operand rA=0 the MPC601 does not update 

Update register r0, or if rA=rD the load data is loaded into register rD and the 

Indexed register update is suppressed. Although the PowerPC architecture 
defines load with update instructions with operand rA=0 or rA=rD as 
invalid forms, the MPC601 allows these cases. 





3.5.3 integer Store Instructions 


For integer store instructions, the contents of register rS are stored into the byte, half-word, 
word or double-word in memory addressed by EA. Many store instructions have an update 
form, in which register rA is updated with the effective address. For these forms, the 
following rules apply: 


¢ IfrA#0, the effective address is placed into register rA. 


¢ IfrS=rA, the contents of register rS are copied to the target memory element, then 
the generated EA is placed into rA. 


The PowerPC architecture defines store with update instructions with rA=0 as an invalid 
form. In the POWER architecture, this form is not considered invalid and specifications 
exist for these cases. To maintain compatibility with POWER in this case, the MPC601 
does not update register r0. In addition, PowerPC defines integer store instructions with the 
condition register update option enabled to be an invalid form and the POWER architecture 
does not. To maintain compatibility in these cases, the MPC601 executes the instruction as 
described in the PowerPC architecture, and it loads an undefined value into CRO field of the 
condition register. 


A summary of the integer store instructions provided by the MPC601 is shown in 
Table 3-17. 


Table 3-17. Integer Store Instructions 
Operand : 
jimonone | ame | open 
Store Byte rS,d(rA) The effective address is the sum (rA|0)+d. Register rS[24—31] is 
stored into the byte in memory addressed by the EA. 
Store Byte rS,rAjrB The effective address is the sum (rA|0)+(rB). rS[24—31] is stored into 
Indexed the byte in memory addressed by the EA. 
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Table 3-17. Integer Store Instructions (Continued) 


Operand F 

SS eee 
Store Byte rS,d(rA) The effective address is the sum (rA|0)+d. rS[24—31] is stored into 
with Update the byte in memory addressed by the EA. The EA is placed into 

register rA. 

Store Byte rS,rA,rB The effective address is the sum (rA|0)+(rB). rS[24—31] is stored into 
with the byte in memory addressed by the EA. The EA is placed into 
Update register rA. 
Indexed 
Store rS,d(rA) The effective address is the sum (rA|0)+d. rS[16-31] is stored into 
Half word the half-word in memory addressed by the EA. 

Store rsS,rA,rB The effective address (EA) is the sum (rA[0)+(rB). rS[16—31] is stored 

half-word into the half-word in memory addressed by the EA. 

Indexed 

Store rS,d(rA) 

Half word 

with Update 

Store rS,tA,rB 

Half word 

with 

Update 

Indexed 
Store Word rS,d(rA) The effective address is the sum (rA|0)+d. Register rS is stored into 

the word in memory addressed by the EA. 

Store Word rS,rAjrB The effective address is the sum (rA|0)+(rB). rS is stored into the 
Indexed word in memory addressed by the EA. 

Store Word rS,d(rA) The effective address is the sum (rA|0)+d. 

with Update 










































The effective address is the sum (rA|0)+d. rS[16-31] is stored into 
the half-word in memory addressed by the EA. The EA is placed into 
register rA. 








The effective address is the sum (rA]0)+(rB). rS[16—31] is stored into 
the half-word in memory addressed by the EA. The EA is placed into 
register rA. 






























Register rS is stored into the word in memory addressed by the EA. 
The EA is placed into register rA.. 













Store Word rs,rA,rB The effective address is the sum (rA|0)+(rB). Register rS is stored 
with into the word in memory addressed by the EA. The EA is placed into 
Update register rA. 

Indexed 


3.5.4 Integer Load and Store with Byte Reversal Instructions 


Table 3-18 describes integer load and store with byte reversal instruction. Note that in other 
PowerPC implementations, load byte-reverse instructions may have greater latency than 
other load instructions. 
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This is not the case in the MPC601. These instructions operate with the same latency as 
other load instructions. 


Table 3-18. Integer Load and Store with Byte Reversal Instructions 


Operand ‘ 

} name | mononc | Brent | atom | 
Load rD,rA,rB The effective address is the sum (rA|0)+(rB). Bits 0-7 of the half- 
Half Word word in memory addressed by the EA are loaded into rD[24—31]. 
Byte- Bits 8-15 of the half-word in memory addressed by the EA are 
Reverse loaded into rD[16—23]. The rest of the bits in rD are cleared to 0. 
Indexed 

ae 





























Load Word The effective address is the sum (rAj0)+(rB). Bits 0-7 of the word 
Byte- in memory addressed by the EA are loaded into rD[24—31]. Bits 
Reverse 8-15 of the word in memory addressed by the EA are loaded into 
Indexed rD[16—23]. Bits 16-23 of the word in memory addressed by the 






EA are loaded into rD[8—15]. Bits 24-31 of the word in memory 
addressed by the EA are loaded into rD[0-7]. 

























Store rS,rA,rB The effective address is the sum (rA|0)+(rB). rS[24—31] are stored 
Half Word into bits 0-7 of the half-word in memory addressed by the EA. 
Byte- rS[16-23] are stored into bits 8—15 of the half-word in memory 
Reverse addressed by the EA. 

Indexed 









Store Word The effective address is the sum (rA]0)+(rB). rS[24—31] are stored 
Byte- into bits 0-7 of the word in memory addressed by EA. Register 
Reverse rS[16-23] are stored into bits 8-15 of the word in memory 
Indexed addressed by the EA. Register rS[8-15] are stored into bits 16-23 






of the word in memory addressed by the EA. rS[0—7] are stored 
into bits 24-31 of the word in memory addressed by the EA. 






an 


3.5.5 Integer Load and Store Multiple Instructions 


The load/store multiple instructions are used to move blocks of data to and from the GPRs. 
The load multiple and store multiple instructions may have operands that require memory 
accesses crossing a 4-Kbyte page boundary. As a result, these instructions may be 
interrupted by a data access exception associated with the address translation of the second 
page. In this case, the MPC6(1 performs all of the memory references from the first page, 
and none of the memory references from the second page before taking the exception. For 
additional information, refer to Section 5.4.3, “Data Access Exception (x’00300’).” 





The PowerPC architecture defines the load multiple instruction (Imw) with rA in the range 
of registers to be loaded as an invalid form. In the POWER architecture, this form is not 
considered invalid and specifications exist for these cases. To maintain compatibility with 
the POWER architecture in this case, the MPC601 will execute the instruction normally, 
except that the loading of register rA is skipped. If rA=0, the register is not considered to 
be actually used for addressing, and the update of r0 (if it is in the range of registers to be 
loaded) is loaded. In addition, the PowerPC architecture defines the load multiple and store 
multiple instructions with misaligned operands (that is, the EA is not a multiple of 4) to be 
an invalid form and the POWER architecture does not. To maintain compatibility with the 
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POWER architecture, the MPC601 executes these instructions subject to the performance 
degradation as described in 5.4.6.1, “Integer Alignment Exceptions.” 


Table 3-19. Integer Load and Store Multiple Instructions 


Operand : 
[_wone | meen | rim | pete 


Load rD,d(rA) The effective address is the sum (rA|0)+d. 

Multiple 2 

Word n= 32 4D. 
nconsecutive words starting at EA are loaded into GPRs rD through 
31. If the EAis not a multiple of 4 the alignment exception handler 
may be invoked if a page boundary is crossed. 

Store a rS,d(rA) The effective address is the sum (rA|0)+d. 


Multiple = 
Word n= (32-"8). 
nconsecutive words starting at the EA are stored from GPRs rS 
through 31. 
If the EAis not a multiple of 4 the alignment exception handler may 
be invoked if a page boundary is crossed. 
3.5.6 Integer Move String Instructions 
The integer move string instructions allow movement of data from memory to registers or 
from registers to memory without concern for alignment. These instructions can be used for 
a short move between arbitrary memory locations or to initiate a long move between 
misaligned memory fields. 
























Load/store string indexed instructions of zero length have no effect, except that load string 
indexed instructions of zero length may set register rD to an undefined value. 


Load string and store string instructions may involve operands that are not word-aligned. 
As described in Section 5.4.6, “Alignment Exception (x’00600’),” misaligned string 
operations will suffer a performance penalty as compared to an aligned operation of the 
same type. Non-word-aligned string operations that cross a 4-Kbyte boundary as well as 
word-aligned string operations that cross a 256-Mbyte boundary always cause an alignment 


_ exception. Other non-word-aligned string operations that cross a double-word boundary 


also are slower than word-aligned string operations. 


Although string operations that are word-aligned and cross a 4-Kbyte boundary operate at 
the MPC601's fastest rate, these instructions may be interrupted by a data access exception 
associated with the address translation of the second page. In this case, the MPC601 
performs all memory references from the first page and none from the second before taking 
the exception. For more information, refer to Section 5.4.3, “Data Access Exception 
(x’00300’).” 


The Load String and Compare Byte Indexed (Iscbx) instruction can lead to several 
architecturally undefined results. When the last register loaded is only partially filled, the 
remaining bytes are considered to be undefined. If loading is terminated due to a byte 
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match, all succeeding bytes are considered to be undefined. In addition, if the condition 
register update option is enabled, and XER[25-31]=0, condition register field CRO is 
undefined. In all of these cases, the MPC601 does not guarantee particular results for these 
undefined fields. The values should simply be treated as undefined. 


If the EA associated with an Iscbx instruction is directed to a memory-forced I/O controller 
interface segment (that is, the segment register T-bit is set and the BUID field equals 

x ‘OQ7F’), the address is translated appropriately and the operation proceeds. On the other 
hand, if the EA associated with an Iscbx instruction is directed to an I/O segment (that is, 
the segment register T-bit is set but the BUID does not equal x’07F‘), then the MPC601 
takes a data access exception and sets bit 5 of the DSISR. 


If rA is in the range of registers to be loaded for a Load String Word Immediate (Iswi) 
instruction or if either rA or rB are in the range of registers to be loaded for a Load String 
Word Indexed (Iswx) or Iscbx instruction, then the PowerPC architecture considers the 
instruction to be of an invalid form. In the POWER architecture, this form is not considered 
invalid and specifications exist for these cases. To maintain compatibility with the POWER 
architecture in this case, the MPC6()1 executes the instruction normally, but loading of 
these registers is inhibited. In addition, the Iswx, Iscbx and stswx instructions that specify 
a string length of zero are considered an invalid form in the PowerPC architecture, but not 
in the POWER architecture. For compatibility with the POWER architecture, the MPC6()1 
executes these instructions normally, but does not alter register rD or cause a memory 
access. 


Table 3-20. Integer Move String Instructions 


Operand : 
frame [rmonone | Smet | epee 


Load String The EAis (rA|0). 


ae o Let n= NB if NB¥0, n = 32 if NB=0; nis the number of bytes to load. 
cio Let nr = (n/4); nris the number of registers to receive data. 
















nconsecutive bytes starting at the EA are loaded into GPRs rD 
through rD+nr-1. Bytes are loaded left to right in each register. The 
sequence of registers wraps around to 10 if required. If the four bytes 
of register rD+nr-1 are only partially filled, the unfilled low-order 
byte(s) of that register are cleared to 0. 










If rA is in the range of registers specified to be loaded, it will be 
skipped in the load process. If operand rA=0, the register is not 
considered as used for addressing, and will be loaded. 
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Table 3-20. Integer Move String Instructions (Continued) 


Operand : 
frame [memonc] mest | orem 


Load String rD,rA,rB The EAis the sum (rA[0)+(rB). 

Word Let n = XER[25-31]; nis the number of bytes to load. 

Indexed 
Let nr = CEIL(n/4); nris the number of registers to receive data. 
If n>0, n consecutive bytes starting at the EA are loaded into registers 
rD through rD+nr-1. 
Bytes are loaded left to right in each register. The sequence of 
registers wraps around to r0 if required. If the four bytes of register 
rD+nr-1 are only partially filled, the unfilled low-order byte(s) of that 
register are cleared to 0. 
If n=0, the contents of register rD is undefined. 
If rA is in the range of registers specified to be loaded, it will be 
skipped in the load process. If operand rA=0, the register is not 
considered as used for addressing, and will be loaded. 

Load String The EAis the sum (rA|0)+(rB). XER[25-31] contains the byte count. 

Iscbx. 


a - 












































and Register rD is the starting register. n=XER[25-31], which is the 
Compare number of bytes to be loaded. nr=CEIL(n/4), which is the number of 
Byte registers to receive data. Starting with the leftmost byte in rD, 
Indexed consecutive bytes in storage addressed by the EA are loaded into rD 


through rD+nr-1, wrapping around back through GPR 0 if required, 
until either a byte match is found with XER[16—23] or n bytes have 
been loaded. If a byte match is found, that byte is also loaded. 






Bytes are always loaded left to right in the register. In the case when 
a match was found before n bytes were loaded, the contents of the 
rightmost byte(s) not loaded of that register and the contents of all 
succeeding registers up to and including rD+nr-1 are undefined. Also, 
no reference is made to storage after the matched byte is found. In 
the case when a match was not found, the contents of the rightmost 
byte(s) not loaded of rD+nr-1 is undefined. 


When XER[25-31]=0, the content of rD is unchanged. The count of 
the number of bytes loaded up to and including the matched byte, if a 
match was found, is placed in XER[25-31]. 


Iscbx Load String and Compare Byte Indexed 

Iscbx. Load String and Compare Byte Indexed with CR 
Update. The dot suffix enables the update of the 
condition register. 





Store The EAis (rA/0). 
ae Let n= NB if NBz0, n = 32 if NB=0; nis the number of bytes to store. 
Immediate Let nr = CEIL(n/4); nris the number of registers to supply data. 


nconsecutive bytes starting at the EA are stored from register rS 
through rS+nr-1. 


Bytes are stored left to right from each register. The sequence of 
registers wraps around through 10 if required. 
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Table 3-20. Integer Move String Instructions (Continued) 


Operand : 
a 


Store rS,rA,rB The effective address is the sum (rA|0)+(rB). 

rie Let n= XER[25-31]; nis the number of bytes to store. 

Indexed Let nr = CEIL(n/4); nris the number of registers to supply data. 
nconsecutive bytes starting at the EA are stored from register rS 
through rS+nr-1. 

Bytes are stored left to right from each register. The sequence of 
registers wraps around through 0 if required. 


3.5.7 Memory Synchronization Instructions 


Memory synchronization instructions can control the order in which memory operations 
are completed with respect to asynchronous events and the order in which memory 
operations are seen by other processors and by other mechanisms that access memory. 
Additional information about these instructions and about related aspects of memory 
management can be found in Chapter 6, “Memory Management Unit.” . 












The synchronize (sync) and the Enforce In-order Execution of I/O (eieio) instructions are 
handled in the same manner internally to the MPC601. These instructions delay execution 
of subsequent instructions until all previous instructions have completed to the point that 
they can no longer cause an exception, all previous memory accesses are performed 
globally, and the sync or eieio operation is broadcast onto the MPC6()1 bus interface. 


System designs that use a second-level cache should take special care in accepting the 
broadcast sync operation and performing the appropriate actions to guarantee that memory 
references that may be queued internally to the second-level cache have been performed 
globally. 


The number of cycles the sync and eieio instructions take depends on various system-level 
sensitivities and on the processor's state when the instruction is issued. As a result, frequent 
use of these instructions may cause some performance degradation. 


Note that the PowerPC architecture defines the sync instruction with the condition register 
update option enabled to be an invalid form whereas the POWER architecture does not. For 
compatibility, the MPC601 executes this case of the instruction consistently with the 
PowerPC architecture, and it loads an undefined value into condition register field CRO. 


The Instruction Synchronize (isync) instruction causes the MPC601 to purge its instruction 
buffers, wait for any preceding sync instructions to complete and then branch to the next 
sequential instruction (which has the effect of clearing the pipeline behind the isyne 
instruction.) 


The Load Word and Reserve Indexed (Iwarx) and Store Word Conditional Indexed 
(stwex.) instructions provide an atomic update function for a single, aligned word of 
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memory. The Iwarx instruction must be paired with a stwex. instruction with the same 
effective address used for both instructions of the pair. 


The Iwarx and stwex. instructions require the EA to be aligned. Software should not 
attempt to emulate a misaligned lwarx or stwex. instruction because there is no correct way 
to define the address associated with the reservation. 


The granularity with which reservations are managed is 32 bytes. Therefore the memory to 
be accessed by a load and reserve and store conditional instruction should be allocated by 
a system library program. Examples of correct uses of these instructions, to emulate 
primitives such as “Fetch and Add,” “Test and Set,” and “Compare and Swap,” can be 
found Appendix G, “Synchronization Programming Examples.” In general, these 
instructions should be used only in system programs, which can be invoked by application 
programs as needed. 


At the most one reservation exists on any given processor—there are not separate 
reservations for words and for double words. The address associated with the reservation 
can be changed by a subsequent Iwarx instruction. The conditionality of the store 
conditional instruction's store is based only on whether a reservation exists, not on a match 
between the address associated with the reservation and the address computed from the EA 
of the stwex. instruction. A reservation is cleared by executing a stwex. instruction to any 
address by the processor having the reservation, by executing any store instruction to the 
address associated with the reservation, by another processor, execution of an sc instruction 
or any exception incurred by the processor which invoked the reservation. 


The memory synchronization instructions available for the MPC601 are summarized in 
Table 3-21. 


Table 3-21. Memory Synchronization Instructions 


Operand ‘ 
ee 


Enforce In- The eieio instruction provides an ordering function for the effects of 

















Order load and store instructions executed by a given processor. Executing 
Execution an eieio instruction ensures that all memory accesses previously 
of /O initiated by the given processor are complete with respect to main 





memory before allowing any memory accesses subsequently initiated 
by the given processor to access main memory. 









The eieio instruction orders load and store operations to cache 
inhibited memory, and store operations to write through cache 
memory. 






The eieio instruction performs the same function as a syne 
instruction when executed by the MPC601. 
















Instruction 
Synchronize 


This instruction waits for all previous instructions to complete, and 
then discards any prefetched instructions, causing subsequent 
instructions to be fetched (or refetched) from memory and to execute 
in the context established by the previous instructions. This 
instruction has no effect on other processors or on their caches. 
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Table 3-21. Memory Synchronization Instructions (Continued) 


Operand 
[none | inomone | rim | emo 


Load Word rD,rA,rB The effective address is the sum (rA|0)+(rB). The word in memory 

and addressed by the EA is loaded into register rD. 

ha This instruction creates a reservation for use by a stwex. instruction. 

aes An address computed from the EA is associated with the reservation, 
and replaces any address previously associated with the reservation. 
The EA must be a multiple of 4. If it is not, the alignment exception 
handler will be invoked if the word loaded crosses a page boundary, 
or the results may be undefined. 

Store Word rS,rA,rB The effective address is the sum (rA|0)+(rB). 

Conditional If a reservation exists, register rS is stored into the word in memory 

na 


Indexed addressed by the EA and the reservation is cleared. 
3.5.8 Floating-Point Load and Store Address Generation 
Floating point load and store operations generate effective addresses using the register 
indirect with immediate index mode and register indirect with index mode, the details of 
which are described below. Floating-point loads and stores are not supported for I/O 


accesses when the SR[BUID] is not equal to x’07F’. The use of floating-point loads and 
stores for I/O access will result in an alignment exception. 





































If a reservation does not exist, the instruction completes without 
altering memory. 


The EQ bit in the condition register field CRO is modified to reflect 
whether the store operation was performed (i.e., whether a 
reservation existed when the stwex. instruction began execution). If 
the store was completed successfully, the EQ bit is set to one. 


The EA must be a multiple of 4; otherwise, the alignment exception 
handler will be invoked if the word stored crosses a page boundary, 
or the results may be undefined. 






Executing a sync instruction ensures that all instructions previously 
initiated by the given processor appear to have completed before any 
subsequent instructions are initiated by the given processor. When 
the sync instruction completes, all memory accesses initiated by the 
given processor prior to the sync will have been performed with 
respect to all other mechanisms that access memory. The sync 
instruction can be used to ensure that the results of all stores into a 
data structure, performed in a “critical section” of a program, are seen 
by other processors before the data structure is seen as unlocked. 






The Enforce In-Order Execution of |/O (eieio) instruction may be 
more appropriate than sync for cases in which the only requirement 
is to control the order in which memory references are seen by 1/O 
devices. 


3.5.8.1 Register Indirect with Immediate Index Addressing 

Instructions using this addressing mode contain a signed 16-bit immediate index (d 
operand) which is sign extended to 32 bits, and added to the contents of a general purpose 
register specified in the instruction (rA operand) to generate the effective address. A zero 
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in the rA operand causes a zero to be added to the immediate index (d operand). This is 
shown in the instruction descriptions as (rAl0). 


Figure 3-4 shows how an effective address is generated when using register indirect with 


immediate index addressing. 


0 67 1112 1617 31 


[Spoon] woes [ma] sid 


Instruction Encoding: 













31 








; Effective Address 


GPR (rA) 





Memory 
Access 


FPR (frD/frS) 


Figure 3-4. Register Indirect with Immediate Index Addressing 


3.5.8.2 Register Indirect with Index Addressing 

Instructions using this addressing mode add the contents of two general purpose registers 
(specified in operands rA and rB) to generate the effective address. A zero in the rA 
operand causes a zero to be added to the contents of general purpose register specified in 
operand rB. This is shown in the instruction descriptions as (rAlQ). 


Figure 3-5 shows how an effective address is generated when using register indirect with 
index addressing. 
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1112 1617 21 22 30 31 


Instruction Encédingeh frD/frS | ra | TB Subopcode 





GPR (rB) 





+) 
0 0 31 


GPR (rA) Effective Address 








FPR (frD/frs) Access 
Figure 3-5 Register Indirect with Index Addressing 


The PowerPC architecture defines floating-point load and store with update instructions 
(Ifsu, Ifsux, Ifdu, Ifdux, stfsu, stfsux, stfdu, stfdux) with operand rA=0 as invalid forms 
of the instructions, but the POWER architecture does not. To maintain compatibility with 
the POWER architecture, the MPC601 accesses memory for these cases but inhibits the 
update of the integer register r0. 


In addition, the PowerPC architecture defines floating-point load and store instructions with 
the condition register update option enabled to be an invalid form. For compatibility with 
the POWER architecture, the MPC601 executes the instruction normally, but also writes an 
undefined value into the condition register field CR1. 


The PowerPC architecture defines that the FPSCR[UE] bit should not be used to determine 
whether denormalization should be performed on floating-point stores. The MPC601 
complies with this definition, although this is different from some POWER architecture 
implementations. 


3.5.9 Floating-Point Load Instructions 


There are two basic forms of floating-point load instruction—single-precision and double- 
precision formats. Because the FPRs support only floating-point, double-precision format, 
single-precision floating-point load instructions convert single-precision data to double- 
precision format before loading the operands into the target FPR. This conversion is 
described in Section 3.6.9.1, “Double-Precision Conversion for Floating-Point Load 
Instructions.” Table 3-22 provides a summary of the floating-point load instructions. 
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Table 3-22. Floating-Point Load Instructions 


Operand : 
ee ee 


Load frD,d(rA) The effective address is the sum (rA|0)+d. 
ee The word in memory addressed by the EA is interpreted as a floating- 
~ point single-precision operand. This word is converted to floating- 
Single- : oh . ; 
ae point double-precision format and placed into register frD. 
Precision 

Z 7 

































Load The effective address is the sum (rAj0)+(r B). 

ee The word in memory addressed by the EA is interpreted as a floating- 

Bs ; point single-precision operand. This word is converted to floating- 
nee point double-precision and placed into register frD. 

Precision 

Indexed 

























Load frD,d(rA) The effective address is the sum (rA|0)+d. 
leas The word in memory addressed by the EA is interpreted as a floating- 
a lee point single-precision operand. This word is converted to floating- 
Seite point double-precision (see Section 3.6.9.1, “Double-Precision 
ne U 7 t Conversion for Floating-Point Load Instructions,”) and placed into 
wt peste register frD. 
The EA is placed into the register specified by rA. 
















Load frD,rA,rB The effective address is the sum (rA|0)+(r B). 

oe The word in memory addressed by the EA is interpreted as a floating- 
ae | point single-precision operand. This word is converted to floating- 
ee point double-precision (see Section 3.6.9.1, “Double-Precision 

with e Conversion for Floating-Point Load Instructions,") and placed into 
Update register frD. 

Indexed The EA is placed into the register specified by rA. 





Ee frD,d(rA) The effective address is the sum (rA|0)+d. 

a The double-word in memory addressed by the EA is placed into 
Double- register frD. 

Precision 
















Load frD,rA,jrB The effective address is the sum (rA]0)+(r B). 

ans The double-word in memory addressed by the EAis placed into 
Double- register frD. 

Precision 

Indexed 


Load frD,d(rA) The effective address is the sum (rA|0)+d. 
ces The double-word in memory addressed by the EAis placed into 
D register frD. 
ouble- 
Precision The EA is placed into the register specified by rA. 
with Update 
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Table 3-22. Floating-Point Load Instructions (Continued) 


Operand 
frame [anemone | tet | epson 


Load frD,rA,rB The effective address is the sum (rA|0)+(r B). 
Fioating- The double-word in memory addressed by the EAis placed into 
Point : 
register frD. 
Double- 
Precision The EAis placed into the register specified by rA. 
with 
Update 
Indexed 


3.5.9.1 Double-Precision Conversion for Floating-Point Load 
Instructions 


The steps for converting from single- to double-precision and loading are as follows: 












WORDJ[0-31] is the floating-point, single-precision operand accessed from memory. 
Normalized Operand 


If WORD/1~) >0 and WORD)y_g) <255 
frD 9-1 < WORD 0; 

frD > <—WORD ; 

frD 3 <—WORD ; 

frD 4 <—WORD ; 

frD s_63 < WORD 3-3; II 29b‘0‘ 


Denormalized Operand 


If WORD 1-3 =0 and WORD o-31 #0 
sign < WORD 0 

exp < -126 

frac 9-52 < b‘0‘ Il WORD 9-3; I! 29b‘0* 
normalize the operand 

Do while frac 0 =0 

frac < frac j_52 Il b‘O* 

exp < exp - 1 

End 

frD 9 < sign 

frD 4-11 < exp + 1023 

frD 12-63 < frac 1-52 


Infinity / QNaN / SNaN / Zero 
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If WORD 1-3 =255 or WORD 1-3; =0 
frD 9-1 < WORD v»_ 

frD 7 < WORD ; 

frD 3 < WORD ; 

frD 4 < WORD; 

frD s.63 < WORD 2-3; II 29b‘0‘ 


For double-precision floating-point load instructions, no conversion is required as the data 
from memory is copied directly into the FPRs. 


Many floating-point load instructions have an update form in which register rA is updated 
with the EA. For these forms, if operand rA+0, the effective address is placed into register 
rA and the memory element (word or double-word) addressed by the EA is loaded into the 
floating-point register specified by operand frD. 


3.5.10 Floating-Point Store Instructions 


This section describes floating-point store instructions. There are two basic forms of the 
store instruction—single- and double-precision. Because the FPRs support only floating- 
point, double-precision format, single-precision floating-point store instructions convert 
double-precision data to single-precision format before storing the operands. The 
conversion steps are described in Section 3.6.9.2.1, “Double-Precision Conversion for 
Floating-Point Store Instructions.” Table 3-23 is a summary of the floating point store 
instructions provided by the MPC601. 


Table 3-23 Floating-Point Store Instructions 


Operand : 






















Store frS,d(rA) The EAis the sum (rA|0)+d. 

Floating- The contents of register frS is converted to single-precision and 
Point stored into the word in memory addressed by the EA. 

Single- 

Precision 





Store frS,rA,rB The EA is the sum (rAj0)+(rB). 

Floating- The contents of register frS is converted to single-precision and 
Point stored into the word in memory addressed by the EA. 

Single- 

Precision 

Indexed 








Store frS,d(rA) The EAis the sum (rA[0)+d. 
Floating- The contents of register frS is converted to single-precision and 
ish stored into the word in memory addressed by the EA. 
ingle- 
ae The EAis placed into the register specified by operand rA. 


with Update 
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Table 3-23 Floating-Point Store Instructions (Continued) 


Operand ‘ 

frS,rA,rB The EAis the sum (rA|0)+(rB). 
The contents of register frS is converted to single-precision and 
stored into the word in memory addressed by the EA. 
The EA is placed into the register specified by operand rA. 

frS,d(rA) The effective address is the sum (rA|0)+d. 
The contents of register frS is stored into the double-word in memory 
addressed by the EA. 

frS,rA,jrB The EA is the sum (rAj0)+(rB). 
The contents of register frS is stored into the double-word in memory 
addressed by the EA. 

frS,d(rA) The effective address is the sum (rA|0)+d. 
The contents of register frS is stored into the double-word in memory 
addressed by the EA. 
The EA is placed into register rA. 


Store 
Floating- 
Point 
Single- 
Precision 
with 
Update 
Indexed 


Store 
Floating- 
Point 
Double- 
Precision 


Store 
Floating- 


Point 
Double- 
Precision 
Indexed 


Store 
Floating- 
Point 
Double- 
Precision 
with Update 


Store 
Floating- 
Point 
Double- 
Precision 


with 
Update 
Indexed 


frS,rA,rB The EAis the sum (rAj0)+(rB). 

The contents of register frS is stored into the double-word in memory 
addressed by EA. 
The EA is placed into register rA. 

3.5.10.1 Double-Precision Conversion for Floating-Point Store 

Instructions 
The steps for converting single- to double-precision for floating-point store instructions are 
as follows: 





Let WORD[0-31] be the word in memory written to. 


If frS[1—11] > 896 or frS[1-63] = 0 
WORD(0-1] < frS[0-1] 
WORD[2-31]< frS[5-34] 
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Denormalization Required 


If 874 < frS[1—-11] < 896 
sign < frS[0] 

exp < frSf1—11] - 1023 
frac < b‘1‘ Il frS[12-63] 
Denormalize operand 
Do while exp < -126 
frac < b‘0‘ Il fracO-62 
exp <exp + 1 

End 

WORDO < sign 
WORD[1-8] < x‘00*° 
WORD[9-3 1] < frac[1—23] 


For double-precision floating-point store instructions, no conversion is required as the data 
from the FPRs is copied directly into memory. Many floating-point store instructions have 
an update form, in which register rA is updated with the effective address. For these forms, 
if operand rA + 0, the effective address is placed into register rA. 


Floating-point store instructions are listed in Table 3-23. Recall that rA, rB, and rD denote 
GPRs, while frA, frB, frC, frS and frD denote FPRs. 


3.5.11 Floating-Point Move Instructions 


Floating-point move instructions copy data from one floating-point register to another with 
data modifications as described for each instruction. These instructions do not modify the 
FPSCR. The condition register update option in these instructions controls the placing of 
result status into condition register field CR1. If the condition register update option is 
enabled, then CR1 is set, otherwise CR1 is unchanged. Floating-point move instructions are 
listed in Table 3-24. 
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Table 3-24. Floating-Point Move Instructions 


Floating- The contents of register frB is placed into frD. 
Point Move 


Register fmr Floating-Point Move Register 


fmr. Floating-Point Move Register with CR Update. The dot 
suffix enables the update of the condition register. 


Floating- The contents of register frB with bit 0 inverted is placed into register 
Point : frD. 
Negate 


fneg Floating-Point Negate 
fneg. Floating-Point Negate with CR Update. The dot suffix 


enables the update of the condition register. 


Floating- The contents of frB with bit 0 cleared to 0 is placed into frD. 
Point 
Absolute 
Value 


fabs Floating-Point Absolute Value 
fabs. Floating-Point Absolute Value with CR Update. The dot 
suffix enables the update of the condition register. 


Floating- The contents of frB with bit 0 set to one is placed into frD. 
Point 
Negative 
Absolute 
Value 


fnabs Floating-Point Negative Absolute Value 
fnabs. —_ Floating-Point Negative Absolute Value with CR Update. 
The dot suffix enables the update of the condition register. 





3.6 Flow Control Instructions 


Branch instructions are executed by the BPU. Some of these instructions can redirect 
instruction execution conditionally based on the value of bits in the condition register. 
When the branch processor encounters one of these instructions, it scans the execution 
pipelines to determine whether an instruction in progress may affect the particular 
condition register bit. If no interlock is found, the branch can be resolved immediately by 
checking the bit in the condition register and taking the action defined for the branch 
instruction. 


If an interlock is detected, the branch is considered unresolved and the direction of the 
branch is predicted using the y-bit as described in Table 3-25. The interlock is monitored 
while instructions are fetched for the predicted branch. When the interlock is cleared, the 
branch processor determines whether the prediction was correct based on the value of the 
condition register bit. If the prediction is correct, the branch is considered completed and 
instruction fetching continues. If the prediction is incorrect, the prefetched instructions are 
purged, and instruction fetching continues along the alternate path. 


3.6.1 Branch instruction Address Calculation 


Branch instructions can change the sequence of instruction execution. Instruction addresses 
are always assumed to be on word boundaries with the MPC601; therefore the processor 
ignores the two low-order bits of the generated branch target address. 
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Branch instructions compute the effective address (EA) of the next instruction address 
using the following addressing modes: 

¢ Branch relative 

¢ Branch to absolute address 

¢ Branch conditional to relative address 

¢ Branch conditional to absolute address 

¢ Branch conditional to link register 

¢ Branch conditional to count register 


3.6.1.1 Branch Relative Address Mode 


Instructions that use branch relative addressing generate the next instruction address by 
sign extending the immediate displacement operand LI and adding the resultant value to 
the current instruction address. Branches using this address mode have the absolute 
addressing option (AA) disabled. If the link register update option (LK) is enabled, the 
effective address of the instruction following the branch instruction is placed in the link 
register. 


Figure 3-6 shows how the branch target address is generated when using the branch relative 
addressing mode. 


0 6 7 29 30 31 





Branch Target Address 


Figure 3-6. Branch Relative Addressing 


3.6.1.2 Branch Conditional Relative Address Mode 


If the branch conditions are met, instructions that use the branch conditional relative 
address mode generate the next instruction address by sign extending the immediate 
displacement operand (BD) and adding the resultant value to the current instruction 
address. Branches using this address mode have the absolute addressing option (AA) 
disabled. If the link register update option (LK) is enabled, the effective address of the 
instruction following the branch instruction is placed in the link register. 
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Figure 3-7 shows how the branch target address is generated when using the branch 
conditional relative addressing mode. 


0 67 1412 1617 30 31 





0 16 17 29 30 31 





Branch Target Address 


Figure 3-7. Branch Conditional Relative Addressing 


3.6.1.3 Branch to Absolute Address Mode | 


Instructions that use branch to absolute address mode generate the next instruction address 
by sign extending the LI operand. Branches using this address mode have the absolute 
addressing option (AA) enabled. If the link register update option (LK) is enabled, the 
effective address of the instruction following the branch instruction is placed in the link 
register. 


Figure 3-8 shows how the branch target address is generated when using the branch to 
absolute address mode. 


0 6 7 29 30 31 


0 29 30 31 


Branch Target Address jo | 0| 


Figure 3-8. Branch to Absolute Addressing 
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3.6.1.4 Branch Conditional to Absolute Address Mode 


If the branch conditions are met, instructions that use the branch conditional to absolute 
address mode generate the next instruction address by sign extending the BD operand. 
Branches using this address mode have the absolute addressing option (AA) enabled. If the 
link register update option (LK) is enabled, the effective address of the instruction 
following the branch instruction is placed in the link register. 


Figure 3-9 shows how the branch target address is generated when using the branch 
conditional to absolute address mode. 


0 67 1112 1617 29 30 31 







31 


Next Sequential Instruction Address 


16 17 29 30 31 


[serenonee [ee [ie 


29 30 31 


Branch Target Address | o| o| 


Figure 3-9. Branch Conditional to Absolute Addressing 





3.6.1.5 Branch Conditional to Link Register Address Mode 


If the branch conditions are met, the branch conditional to link register instruction 
generates the next instruction address by fetching the contents of the link register and 
clearing the two low order bits to zero. If the link register update option (LK) is enabled, 
the effective address of the instruction following the branch instruction is placed in the link 
register. 


Figure 3-10 shows how the branch target address is generated when using the branch 
conditional to link register address mode. 


3.6.1.6 Branch Conditional to Count Register 


If the branch conditions are met, the branch conditional to count register instruction 
generates the next instruction address by fetching the contents of the count register and 
clearing the two low order bits to zero. If the link register update option (LK) is enabled, 
the effective address of the instruction following the branch instruction is placed in the link 
register. 
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7 1112 1617 2122 30 31 


Instruction Encoding: Senet | te [LK 







Next Sequential Instruction Address 


Figure 3-10. Branch Conditional to Link Register Addressing 


Figure 3-11shows how the branch target address is generated when using the branch 
conditional to count register address mode. 


0 67 1112 1617 21 22 30 31 





Branch Target Address 


Figure 3-11. Branch Conditional to Count Register Addressing 
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When the branch instructions contain immediate addressing operands, the target addresses 
can be computed sufficiently ahead of the branch instruction that instructions can be 
prefetched along the target path. If the branch instructions use the link and count registers, 
instructions along the target path can be prefetched if the link or count register is loaded 
sufficiently ahead of the branch instruction. 


Branching can be conditional or unconditional, and the return address can optionally be 
provided. If the return address is to be provided, the effective address of the instruction 
following the branch instruction is placed in the link register after the branch target address 
has been computed. This is done regardless of whether the branch is taken. 


For branch conditional instructions, the BO operand specifies the conditions under which 
the branch is taken. The first four bits of the BO operand specify how the branch is affected 
by or affects the condition and count registers. The fifth bit, shown in Table 3-25 as having 
the value y, may be used by some implementations for branch prediction as described 
below. 


The encodings for the BO operands are shown in Table 3-25. 


Table 3-25. BO Operand Encodings 
ee Teton 


o000y Decrement the CTR, then branch if the decremented CTR #0 and the condition is 
FALSE. 

0001 y Decrement the CTR, then branch if the decremented CTR = 0 and the condition is 
FALSE. 


001 zy Branch if the condition is FALSE. 


0100y Decrement the CTR, then branch if the decremented CTR #0 and the condition is 
TRUE. 


0101y Decrement the CTR, then branch if the decremented CTR = 0 and the condition is 
TRUE. 


The zindicates a bit that must be zero; otherwise, the instruction form is invalid. 





The ybit provides a hint about whether a conditional branch is likely to be taken and is used by the 
MPC601 to improve performance. Other implementations may ignore the y bit. 


The “branch always” encoding of the BO operand does not have a “y” bit. 
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Setting the “y” bit to 0 indicates that the following behavior is likely: 
¢ For bex with a negative value in the displacement operand, the branch is taken. 


¢ Inall other cases (bex with a non-negative value in the displacement operand, belrx, 
or bectrx), the branch is not taken. 


Setting the “y” bit to 1 reverses the preceding indications. 


The sign of the displacement operand is used as described above even if the target is an 


669 


absolute address. The default value for the “y” bit should be 0, and should only be set to 1 


if software has determined that the prediction corresponding to “y” = | is more likely to be 


correct than the prediction corresponding to “y” = 0. Software that does not compute branch 
predictions should set the “‘y” bit to zero. 


For all three of the branch conditional instructions, the branch should be predicted to be 
taken if the value of the following expression is 1, and to fall through if the value is 0. 


((BO[O] & BO[2]) |S) 6 BO[4] 


In the expression above, S (bit 16 of the branch conditional instruction coding) is the sign 
bit of the displacement operand if the instruction has a displacement operand and is (0) if the 
operand is reserved. BO[4] is the “y” bit, or 0 for the “branch always” encoding of the BO 
operand. (Advantage is taken of the fact that, for belrx and bectrx, bit 16 of the instruction 
is part of a reserved operand and therefore must be 0).) 


3.6.2 Bl Operand 


The 5-bit BI operand in branch conditional instructions specifies which of the 32 bits in the 
CR represents the condition to test. 


3.6.3 Basic Branch Mnemonics 

The mnemonics in Table 3-26 allow all the common BO operand encodings to be specified 
as part of the mnemonic, along with the absolute address (AA) and set link register (LK) 
bits. 


Notice that there are no simplified mnemonics for relative and absolute unconditional 
branches. For these, the basic mnemonics b, ba, bl, and bla are used. 
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Table 3-26. Simplified Branch Mnemonics 











Branch Semantics 


LR bit not set LR bit set 
Branch unconditionally 


bc bca belr to os bcl bcla belrl to bectrl 
Relative | Absolute | LR Relative } Absolute | LR to CTR 
eet 
| Branch if condition true_| if condition | Branch if condition true_| 


Branch if condition 
false 
Decrement CTR, 
branch if CTR non-zero 


Decrement CTR, 
branch if CTR non-zero 
AND condition true 























Decrement CTR, 
branch if CTR non-zero 
AND condition false 


Decrement CTR, 
branch if CTR zero 


Decrement CTR, 
branch if CTR zero 
AND condition true 











Decrement CTR, bdzfl bdzfla 
branch if CTR zero 
AND condition false 


Table 3-26 provides the abbreviated set of simplified mnemonics for the most commonly 
performed conditional branches. Unusual cases of conditional branches can be coded using 
a basic branch conditional mnemonic (bc, belr, bectr) with the condition to be tested 
specified as a numeric first operand. 


Instructions using a mnemonic from Table 3-26 that tests a condition specify the condition 
as the first operand of the instruction. Table 3-27 summarizes the mnemonic symbols and 
the equivalent numeric values used to interpret a condition register CR field during a branch 
conditional instruction compare operation. 
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Table 3-27. Condition Register CR Field Bit Symbols 


Summary overflow 


Unordered (after floating- 


point comparison) 








SS 
ca ae 
ees aoa 


Table 3-28 summarizes the mnemonic symbols and the equivalent numeric values used to 
identify the condition register CR field to be evaluated by the compare operation. 


Table 3-28. Condition Register CR Field Identification Symbols 


Symbol 


ee ee 


CR2 


CR3 
CR4 
CR5 
CR6 
CR7 





The simplified branch mnemonics and the symbols in Table 3-27 and Table 3-28 are 
combined in an expression that identifies the bit (0-31) of CR to be tested, as follows: 


Examples: 


1. Decrement CTR and branch if it is still non-zero (closure of a loop controlled by a 
count loaded into CTR). 


bdnz target (equivalent to be 16,0,target) 
2. Same as (1) but branch only if CTR is non-zero and condition in CRO is “equal.” 
bdnz eq,target (equivalent to be 8,2,target) 
3. Same as (2), but “equal” condition is in CRS. 
bdnzt cr5+eq,target (equivalent to be 8,22,target) 
4. Branch if bit 27 of CR is false. 
bf 27,target (equivalent to be 4,27,target) 
5. Same as (4), but set the link register. This is a form of conditional “‘call.” 
bfl 27,target (equivalent to bel 4,27,target) 
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3.6.4 Branch Mnemonics Incorporating Conditions 


The mnemonics defined in Table 3-30 are variations of the “branch if condition true” and 
“branch if condition false” BO encodings, with the most common values of the BI operand 
represented in the mnemonic rather than specified as a numeric operand. 


The two-letter codes for the most common combinations of branch conditions is shown in 
Table 3-29. 


Table 3-29. Two-Letter Codes for Branch Comparison Conditions 


Less than or equal 


Greater than or equal 


fot | Greater than 
ph Not less than 













Not greater than 
Summary overflow 


Not summary overflow 





Unordered (after floating-point comparison) 


Not unordered (after floating-point 
comparison) 


These codes are reflected in the simplified mnemonics shown in Table 3-30. 








Table 3-30. Simplified Branch Mnemonics Incorporating Comparison Conditions 









LR bit not set LR bit set 
Branch Semantics 


be bea belrto | bectr | bel bela beirlto | bectrl 
Relativ | Absolute | LR to CTR | Relative | Absolute | LR to CTR 
e 


[Borciestnorenal [ve [se [wor ser [wer [a [ ost [Bos 
[acrcirewat [eq [ee] bear [omer [Beet [vous [bott_[ Bons 
[Barcireawren [90 | bave [boo] eer 
Barcitgsuervan foot | 

[Bancieatiestan [onl 
[Barca [oe 
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Table 3-30. Simplified Branch Mnemonics Incorporating Comparison Conditions 






LR bit not set LR bit set 


be bea bclr to bectr bel bela belri to | bectrl 
ore Absolute | LR to CTR | Relative | Absolute | LR to CTR 
| Branch if not greater than if not | Branch if not greater than than 


Branch if summary 
overflow 

Branch if not summary 
overflow 


| Branch unordered =| if unordered 


Branchif not unordered ee eee ee ee eee 


Instructions using the mnemonics in Table 3-30 specify the condition register field in an 
optional first operand. If the CR field being tested is CRO, this operand need not be 
specified. Otherwise, one of the CR field symbols listed in Table 3-28 is coded as the first 
operand. 








Branch Semantics 





Examples: 


1. Branch if CRO reflects condition “not equal.” 
bne target (equivalent to be 4,2,target) 


2. Same as (1), but condition is in CR3. 
bne cr3,target (equivalent to be 4,14, target) 

3. Branch to an absolute target if CR4 specifies “greater than,” setting the link register. 
This is a form of conditional “call”, as the return address is saved in the link register. 
bgtla cr4,target (equivalent to bela 12,17,target) 


4. Same as (3), but target address is in the count register. 
bgetctrl cr4 (equivalent to bectrl 12,17) 
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3.6.5 Branch Instructions 
Table 3-31 describes the branch instructions provided by the MPC601. 


Table 3-31. Branch Instructions 


Operand 
ee 


Branch imm_addr Branch. Branch to the address computed as the sum of 



































the immediate address and the address of the current 
instruction. 

ba Branch Absolute. Branch to the absolute address 
specified. 

bl Branch then Link. Branch to the address computed as the 


sum of the immediate address and the address of the 
current instruction. The instruction address following this 
instruction is placed into the link register (LR). 
bla Branch Absolute then Link. Branch to the absolute 
address specified. The instruction address following this 
instruction is placed into the link register (LR). 














Branch 
Conditional 


BO,Bi, 
target_addr 


The BI operand specifies the bit in the condition register (CR) to be 
used as the condition of the branch. The BO operand is used as 
described in Table 3-25. 


be Branch Conditional. Branch conditionally to the address 
computed as the sum of the immediate address and the 
address of the current instruction. 


bca . Branch Conditional Absolute. Branch conditionally to the 
absolute address specified. 
bel Branch Conditional then Link. Branch conditionally to the 


address computed as the sum of the immediate address 
and the address of the current instruction. The instruction 
address following this instruction is placed into the link 
register. 
bcla Branch Conditional Absolute then Link. Branch 
conditionally to the absolute address specified. The 
instruction address following this instruction is placed into 
the link register. 













Branch 
Conditional 
to Link 

Register 


The BI operand specifies the bit in the condition register to be used 
as the condition of the branch. The BO operand is used as described 
in Table 3-25. 


belr Branch Conditional to Link Register. Branch conditionally 
to the address in the link register. 

belrl Branch Conditional to Link Register then Link. Branch 
conditionally to the address specified in the link register. 
The instruction address following this instruction is then 

placed into the link register. 
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Table 3-31. Branch Instructions (Continued) 


Branch The BI operand specifies the bit in the condition register to be used 
Conditional as the condition of the branch. The BO operand is used as described 
to Count in Table 3-25. 

Register 


bectr Branch Conditional to Count Register. Branch 
conditionally to the address specified in the count register. 
bectrl Branch Conditional to Count Register then Link. Branch 
conditionally to the address specified in the count register. 
The instruction address following this instruction is placed 
into the link register. 
Note: If the “decrement and test CTR” option is specified (BO[2]=0), 
the instruction form is invalid. For the MPC601, the decremented 
count register is tested for zero and branches based on this test, but 
instruction fetching is directed to the address specified by the non- 
decremented version of the count register. Use of this invalid form of 
this instruction is not recommended. 





3.6.6 Condition Register Logical Instructions 


Similar to the system call (sc) instruction, condition register logical instructions, shown in 
Table 3-32, and the move condition register field (merf) instruction are defined as flow 
control instructions, although they are executed by the IU. 


Note that if the link register update option (LR) is enabled for any of these instructions, the 
PowerPC architecture defines these forms of the instructions as invalid; however, the 
MPC6()1 executes these instructions and leaves the link register in an undefined state. 


Table 3-32. Condition Register Logical Instructions 


Condition ; ; The bit in the condition register specified by crbAis ANDed with the 
Register bit in the condition register specified by crbB. The result is placed 
AND into the condition register bit specified by crbD. 


Condition crbD,crbA,crbB | The bit in the condition register specified by crbAis ORed with the 
Register bit in the condition register specified by crbB. The result is placed 
OR into the condition register bit specified by crbD. 


Condition crbD,crbA,crbB | The bit in the condition register specified by crbAis XORed with the 
Register bit in the condition register specified by crbB. The result is placed 
XOR into the condition register bit specified by crbD. 


Condition crbD,crbA,crbB | The bit in the condition register specified by crbAis ANDed with the 
Register bit in the condition register specified by crbB. The complemented 
NAND result is placed into the condition register bit specified by crbD. 


Condition crbD,crbA,crbB | The bitin the condition register specified by crbA is ORed with the 
Register bit in the condition register specified by crbB. The complemented 
NOR result is placed into the condition register bit specified by erbD. 
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Table 3-32. Condition Register Logical Instructions (Continued) 


Operand F 
jane [memonc| Brent | mmm 


Condition crbD,crbA, The bit in the condition register specified by crbAis XORed with the 
Register crbB bit in the condition register specified by crbB. The complemented 

Equivale result is placed into the condition register bit specified by crbD. 
nt 

































Condition 
Register 
AND 

with 
Comple 
ment 


crbD,crbA, 
crbB 


The bit in the condition register specified by erbA is ANDed with the 
complement of the bit in the condition register specified by crbB and 
the result is placed into the condition register bit specified by crbD. 






















Condition 
Register 
OR with 
Comple 
ment 


crbD,crbA, 
crbB 


The bit in the condition register specified by crbA is ORed with the 
complement of the bit in the condition register specified by crbB and 
the result is placed into the condition register bit specified by crbD. 















Move 
Condition 
Register 
Field 


The contents of crfS are copied into crfD. No other condition 
register fields are changed. 


3.6.7 System Linkage Instructions 


This section describes the system linkage instructions (see Table 3-33). The system call (sc) 
instruction permits a program to call on the system to perform a service and the system to 
return from performing a service or from processing an exception. 


Table 3-33. System Linkage Instructions 


Operand 


System Call | sc When executed, the effective address of the instruction following the 
sc instruction is placed into SRRO. Bits 16-31 of the MSR are placed 
into bits 16-31 of SRR1, and bits 0-15 of SRR1 are set to undefined 
values. Then a system call exception is generated. The exception 

causes the MSR to be altered as described in Section 5.4, “Exception 
Definitions.” 






















The exception causes the next instruction to be fetched from offset 
x'C00' from the base physical address indicated by the new setting of 
MSRI[IP]. For a discussion of POWER compatibility with respect to 

instruction bits 16-29, refer to the Appendix K, “Incompatibilities with 
the POWER Architecture. To ensure compatibility with future versions 
of the PowerPC architecture, bits 16-29 should be coded as zero and 
bit 30 should be coded as a 1. 


This instruction is context synchronizing. 
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Table 3-33. System Linkage Instructions (Continued) 


Operand 


Return Bits 16-31 of SRA1 are placed into bits 16-31 of the MSR, then the 
from next instruction is fetched, under control of the new MSR value, from 
Interrupt the address SRRO[0-29] || b‘00'. 
This instruction is a supervisor-level instruction and is context 
synchronizing. 


3.6.8 Simplified Mnemonics for Branch Processor Instructions 


To simplify assembly language programming, a set of simplified mnemonics and symbols 
is provided that defines simple shorthand for the most frequently used forms of branch 
conditional, compare, trap, rotate and shift, and certain other instructions. 













Mnemonics are provided so that branch conditional instructions can be coded with the 
condition as part of the instruction mnemonic rather than as a numeric operand. Some of 
these are shown as examples with the branch instructions. 


In some implementations the processor may keep a stack of the link register values most 
recently set by branch and link instructions, with the possible exception of the form shown 
below for obtaining the address of the next instruction. To benefit from this stack, the 
following programming conventions should be used. 


Let A, B, and Glue be programs. 
Obtaining the address of the next instruction—use the following form of branch and link. 


bel 20,31,$+4 


Loop Counts— Keep them in the count register, and use one of the branch conditional 
instructions to decrement the count and to control branching (e.g., branching back to the 
start of a loop if the decremented counter value is non-zero). 


Computed GOTOs, Case Statements, Etc.—Use the count register to hold the address to 
branch to, and use the bectr instruction with the link register option disabled (LK=0) to 
branch to the selected address. 


Direct Subroutine Linkage—Here A calls B and B returns to A. The two branches should be 
as follows: 


¢ Acalls B: use a branch instruction that enables the link register (LK=1). 


¢ Breturns to A: use the belr instruction with the link register option disabled(LK=0) 
(the return address is in, or can be restored to, the link register). 





MOTOROLA Chapter 3. Addressing Modes and Instruction Set Summary 3-77 


Indirect Subroutine Linkage—Here A calls Glue, Glue calls B, and B returns to A rather than 
to Glue. (Such a calling sequence is common in linkage code used when the subroutine that 
the programmer wants to call, here B, is in a different module from the caller: the binder 
inserts “glue” code to mediate the branch.) The three branches should be as follows: 


¢ Acalls Glue. Use a branch instruction that sets the link register with the link register 
option enabled(LK=1). 


¢ Glue calls B. Place the address of B in the count register, and use the bectr 
instruction with the link register option disabled (LK=0). 


¢ BreturnstoA. Use the belr instruction with the link register option disabled((LK=0) 
(the return address is in, or can be restored to, the link register). 


PowerPC-compliant assemblers provide the mnemonics and symbols listed here and 
possibly others. Programs written to be portable across various assemblers for the PowerPC 
architecture should not assume the existence of mnemonics not defined here. 


3.6.9 Trap Mnemonics 


The trap instructions shown in Table 3-34 are provided to test for a specified set of 
conditions. If any of the conditions tested by a trap instruction are met, the system trap 
handler is invoked. If the tested conditions are not met, instruction execution continues 
normally. 


Table 3-34. Trap Instructions 


Operand 


Trap Word TO,rA,SIMM | The contents of rA is compared with the sign-extended SIMM 

Immediate operand. If any bit in the TO operand is set to 1 and its corresponding 
condition is met by the result of the comparison, then the system trap 
handler is invoked. 





Trap Word TO,rA,rB The contents of rA is compared with the contents of rB. If any bit in 
the TO operand is set to 1 and its corresponding condition is met by 
the result of the comparison, then the system trap handler is invoked. 


The trap instructions evaluate a trap condition as follows: 


The contents of register rA is compared with either the sign-extended SIMM field or with 
the contents of register rB, depending on the trap instruction. The comparison results in 
five conditions which are ANDed with operand TO. If the result is not 0, the trap exception 


- handler is invoked. These conditions are provided in Table 3-35. 
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Table 3-35. TO Operand Bit Encoding 


TO Bit ANDed with Condition 










Greater than 


Logically less than 
Logically greater than 


A standard set of codes has been adopted for the most common combinations of trap 
conditions, as shown in Table 3-36. The mnemonics defined in Table 3-37 are variations of 
the trap instructions, with the most useful values of the trap instruction TO operand 
represented as a mnemonic rather than specified as a numeric operand. 


Table 3-36. Trap Mnemonics Coding. 


TO Operand 


fe [as Fc Ce OC 
een oa ce fe fo 









Greater than 


Groaterthanorogel faz fo fa [st fo fo | 


16 

4 
Not less than 2 ie 
pNoteqal tft fo fo Jo 
Not greater than eee oe 
Logically less than ee ae ae ee ee 
Logicaly less thanor equal [6 [o_o [1 [1 |o | 


Logically greater than or 1 
equal 










begiealy pesiertete ts 


Note: <U indicates an unsigned less than evaluation will be performed. 
>U indicates an unsigned greater than evaluation will be performed. 
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These codes are reflected in the mnemonics shown in Table 3-37. 


Table 3-37. Trap Mnemonics 


32-Bit Comparison 


Trap Semantics twi Immediate tw Register 
Trap unconditionally =e 
Trap if less than twit 


Trap if less than or equal 
Trap if equal 
Trap if greater than or equal 


Trap if greater than Ftwott | 


Trap if not less than 


Trap if logically less than 


Trap if logically less than or equal 


Trap if not equal 


Trap if logically greater than or equal 





Trap if logically greater than twiigi 
Trap if logically not less than twin 
Examples: 
¢ Trap if Rx, considered as a 32-bit quantity, is logically greater than x‘7FF*. 
twlg rA, x‘7FF’ (equivalent to twi 1,rA, x‘7FF‘) 
¢ Trap unconditionally. 


trap (equivalent to tw 31,0,0) 


3.7 Processor Control Instructions 


Processor control instructions are used to read from and write to the machine state register 


(MSR), special purpose registers (SPRs) and condition register (CR). 


3.7.1 Move to/from Special Purpose Register Instructions 


The MPC601 defines an additional register (MQ register) to the user register set and 
programming model. As a result, the mtspr and mfspr instructions have been extended to 
accommodate access to the MQ register for the MPC601. The SPR encoding for the MQ 
register is b‘OO000 00000‘. 


The MPC601 also allows user-level read access to the decrementer register (DEC). The 
SPR encoding for DEC is b‘00110 00000’ and is valid only for the mfspr instruction. 
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During the execution of the mtspr instruction, the MPC601 does not fully decode SPR 
values for the XER, DEC, LR, MQ, CTR, RTCL or RTCU registers. Similarly, during the 
execution of the mfspr, the MPC601 does not fully decode the SPR values for the XER, 
LR, MQ, or CTR registers. Instead, it only decodes the upper five bits of the SPR field and 
assumes that the lower five bits are cleared to zeros. The PowerPC architecture defines the 
mfspr and mtspr instructions with the condition register update option enabled to leave 
condition register field CRO undefined. In this case, the MPC601 sets condition register 
field CRO to an undefined value. Move to/from Special Purpose Register instructions are 
listed in Table 3-38. For more information see Chapter 10, “Instruction Set." 


Simplified mnemonics are provided for the mtspr and mfspr instructions so they can be 
coded with the SPR name as part of the mnemonic rather than as a numeric operand. Some 
of these are shown as examples with the two instructions. 


Table 3-38. Move to/from Special Purpose Register Instructions 


Operand . 
jane | uoonone | Sent | open 


Move to The SPR field denotes a special purpose register, encoded as shown 
Special in Table 3-39 and Table 3-40 below. The contents of rS are placed 
Purpose into the designated SPR. 


Register Simplified mnemonic examples: 
mtxer rA mtspr 1,rA 
mtlrrA mtspr 8,rA 
mtctr rA mtspr 9,rA 































The SPR field denotes a special purpose register, encoded as shown 
in Table 3-39 and Table 3-40 below. The contents of the designated 
SPR are placed into rD. 


Move from 
Special 

Purpose 
Register 






Simplified mnemonic examples: 
mfxer rA mfspr rA,1 
mflr rA mfspr rA,8 
mfctr rA mfspr rA,9 





















The contents of rS are placed into the condition register under control 
of the field mask specified by operand CRM. The field mask identifies 
the 4-bit fields affected. Let jbe an integer in the range 0-7. If 
CRM(/) = 1, then CR field i(CR bits 4*/ through 4*i4+3) is set to the 
contents of the corresponding field of rS. 


Move to 
Condition 
Register 

Fields 






In some PowerPC implementations, this instruction may perform 
more slowly when only a portion of the fields are updated as opposed 
to all of the fields. This is not true for the MPC601. 







The contents of XER[0-3] are copied into the condition register field 
Condition designated by crfD. All other fields of the condition register remain 
Register unchanged. XER[0-3] is cleared to 0. 
from XER 

















Move from The contents of the condition register are placed into rD. 
Condition 


Register 
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Table 3-38. Move to/from Special Purpose Register Instructions (Continued) 


Operand , 
[rane | menone | Smet | ermatn | 


Move to The contents of rS are placed into the MSR. 

Machine This instruction is a supervisor-level instruction and is context 
State synchronizing. 
Register 


Move from The contents of the MSR are placed into rD. This is a supervisor- 
Machine level instruction. 
State 


Register 


For mtspr and mfspr instructions, the SPR number coded in assembly language does not 
appear directly as a 10-bit binary number in the instruction. The number coded is split into 
two 5-bit halves that are reversed in the instruction, with the high-order 5 bits appearing in 
bits 16-20 of the instruction and the low-order 5 bits in bits 11-15. 


Table 3-39 summarizes SPR encodings that the MPC601 recognizes when operating at the 
user level. 


Table 3-39. User-Level SPR Encodings 


Register aa 


oT 
Oo 
a 
| 
er es 
Peed 










Decimal 
Value in rD 







SPR[0—4] SPR[5-9] 








b'00100 00000' RTCU Real- time clock upper 
register’ 


b'00101 00000' RTCL Real- time clock lower register! 
b'00110 00000 Decrementer register @ 


' Read-only. 


2 Access to the DEC register is restricted to read-only while the 
processor is in the user-mode. User-level decrementer access is 
provided for POWER compatibility, and is specific to the MPC601. 
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Table 3-40 summarizes SPR encodings that the MPC601 recognizes when operating at the 
supervisor level. 


Table 3-40. Supervisor-Level SPR Encodings 


Register Description 
Name 


§ | seorer ico [ rte [Retr sok ver ett 
i a 
service register 

[sree oone™ [aR [Bataceesegior 
a b'10110 00000’ Bre Ss Decrementer register 
a b’'11001 00000" Table search descriptor register 
[a8 [wrretozooon [sano [Save andes 
[at [error aeoon [sar [Saveantaseregtors 
| 282, | bttoto 010000 | SEAR External access register 
[se [eter ene [aaa neaaion arate 
ee dat 


— b'10000 11111' Checkstop Checkstop sources and 
(HIDO) enables register 
b'10001 11111’ Debug Debug modes register 
(HID1) 
1010 b'10010 11111’ IABR Instruction address breakpoint 
(HID2) register 
b'10101 11111’ DABR Data address breakpoint 
(HID 5) register 






















Decimal 
Value in rD 






SPR[0-4] SPR[5-9] 
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Table 3-40. Supervisor-Level SPR Encodings (Continued) 


Register Description 
Name 
1023 b'41144 41111’ PIR Processor identification 
(HID15) register 


If the SPR field contains any value other than one of these implementation-specific values 
or one of the values shown in Table 3-40, the instruction form is invalid. For an invalid 
instruction form in which SPR[0]=1, the system supervisor-level instruction error handler will 
be invoked if the instruction is executed by a user-level program. If the instruction is 
executed by a supervisor-level program, the result is a no-op. 











Decimal 
Value in rD 









SPR[0-4] SPR[5-9] 


SPR[0]=1 if and only if writing the register is supervisor-level. Execution of this instruction 
specifying a defined and supervisor-level register when MSR[PR]=1 results in a privilege 
violation type program exception. 


SPR encodings for the DEC, MQ, RTCL and RTCU registers are not part of the PowerPC 
architecture. 


The PVR (processor version register) is a read-only register. 


Note: For compatibility with future versions of this architecture, only SPR numbers 
discussed in these instruction descriptions should be used. 


The mtspr and mfspr instructions specify a Special Purpose Register (SPR) as a numeric 
operand. Simplified mnemonics are provided that represent the SPR in the mnemonic rather 
than requiring it to be coded as an operand. Table 3-41 below specifies the simplified 
mnemonics provided on the MPC601 for SPR operations. 


Table 3-41. SPR Simplified Mnemonics 


Special Purpose move to.Srn Move to SPR Move romern Move from SPR 


: Simplified ; Simplified ; 
Register Mnemonic Instruction Miemonic Instruction 


Integer unit exception mtxer rS mtspr 1,rS mfspr rD,1 
register 


Link register mtspr 8,rS mfspr rD,8 


DAE/source instruction | mtdsisr rS mtspr 18,rS mfspr rD,18 
service register 

Data address register |midarrs | mtspr 19,rS mfspr rD,19 
Decrementer mtspr 22,rS mfspr rD,22 


Status save/restore mtsrro rS mtspr 26,rS mfspr rD,26 

register 0 

Status save/restore mtspr 27,rS mfspr rD,27 
‘register 1 
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Table search descriptor mtspr 25,rS mfspr rD,25 
register 1 


Table 3-41. SPR Simplified Mnemonics 


Move to SPR Movefrom SPR 


amenes Move to SPR are 
Simplified instruction Simplified 


Move from SPR 
Instruction 


Special Purpose 
Register 


Mnemonic Mnemonic 


General special mtsprg n, rs mtspr 272+n,rS mfsprg rD, n mfspr rD,272+n 
purpose registers GO 
through G3 


External access register mtspr 282,rS mfspr rD,282 


Processor version mfspr rD,287 
register 


BAT register, upper mtspr 528+(2*n),rS | mfibaturD,n | mfspr rD,528+(2"n) 
Bat register, lower mtibatl n, rS mtspr 529+ (2*n),rS_ | mfibatl rD, n, mfspr rD,529+(2*n) 


3.8 Memory Control Instructions 


This section describes memory control instructions, which include the following: 





* Cache management instructions . 
* Segment register manipulation instructions 
¢ Translation lookaside buffer management instructions 


3.8.1 Supervisor-Level Cache Management Instruction 


This section summarizes the operation of the only supervisor-level cache management 
instruction implemented on the MPC601. 
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Table 3-42. Cache Management Supervisor-Level Instruction 


Data The effective address is the sum (rA|0)+(rB). 
Cache 
Block 
Invalidate 


The action taken depends on the memory mode associated with the 
target, and the state (modified, unmodified) of the block. The 
following list describes the action to take if the block containing the 
byte addressed by the EAis or is not in the cache. 


*«  Coherency Required (WIM = xx1) 


— Unmodified Block—Iinvalidates copies of the block in the 
caches of all processors. 


Modified Block—Invalidates copies of the block in the 
caches of all processors. (Discards the modified 
contents.) 


Absent Block—lf copies are in the caches of any other 


processor, causes the copies to be invalidated. 
(Discards any modified contents.) 


¢ Coherency Not Required (WIM = xx0) 


Unmodified Block—Invalidates the block in the local 
cache. 


Modified Block—Invalidates the block in the local cache. 
(Discards the modified contents.) 


Absent Block—No action is taken. 


When data address translation is enabled, MSR[DT]=1, and the 
logical address has no translation, a data access exception occurs. 
See Section 5.4.3, “Data Access Exception (x'00300’)." 


The function of this instruction is independent of the write-through 
and cache-inhibited/allowed modes determined by the WIM bit 
settings of the block containing the byte addressed by the EA. 


This instruction is treated as a store to the addressed byte with 
respect to address translation and protection. The reference and 
change bits are modified appropriately. 


Ifthe EA specifies a memory address for which T=1 in the 
corresponding segment register, the instruction is treated as a no-op. 





3.8.2 User-Level Cache Instructions 


The instructions summarized in this section provide user-level programs the ability to 
manage the MPC601’s unified cache. Note that the term block in the context of the on-chip 
cache refers to a sector within the cache (and not a block defined by the block address 
translation (BAT) mechanism). 


As with other memory-related instructions, the effect of the cache instructions on memory 
are weakly consistent. If the programmer needs to ensure that cache or other instructions 
have been performed with respect to all other processors and mechanisms, a syne 
instruction must be placed in the program following those instructions. 
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When data address translation is disabled (MSR[DT]=0), the Data Cache Block Set to Zero 
(dcbz) instruction allocates a line in the cache and may not verify that the physical address 
is valid. If a line is created for an invalid physical address, a machine check condition may 
result when an attempt is made to write that line back to memory. The line could be written 
back as the result of the execution of an instruction that causes a cache miss and the invalid 
addressed line is the target for replacement or a Data Cache Block Store (debst) instruction. 


Any cache control instruction that generates an effective address that corresponds to an I/O 
controller interface segment (SR[T]=1) that has the SR[BUID] field equal to x’07F’ 
translates the address appropriately and performs the cache operation based on that address. 
A cache control instruction that generates an effective address that corresponds to an I/O 
controller interface segment (SR[T]=1), but with the SR[BUID] not equal to x’07F’ is 
treated as a no-op. 


Since the MPC601 is implemented with a unified (combined instruction and data) cache, 
the Instruction Cache Block Invalidate (icbi) instruction is treated as a no-op by the 
MPC6()1 processor. Table 3-43 summarizes the cache instructions that are accessible to 
user-level programs. 


Table 3-43. User-Level Cache Instructions 


Operand : 
j none | amen | Brest | maton 


Data The EAis the sum (rA/0)+(rB). 
Cache 
Block Touch 




































This instruction provides a method for improving performance 
through the use of software-initiated prefetch hints. The MPC601 
performs the fetch for the cases when the address hits in the UTLB or 
the BTLB, and when it is permitted load access from the addressed 
page. The operation is treated similarly to a byte load operation with 
respect to coherency. 


If the address translation does not hit in the UTLB or BTLB, or if it 
does not have load access permission, the instruction is treated as a 
no-op. 


If the access is directed to a cache-inhibited page, or to an 1/O 
controller interface segment, then the bus operation occurs, but the 
cache is not updated. 


This instruction never affects the reference or change bits in the 
hashed page table. 


While the MPC601 maintains a cache line size of 64 bytes, the dcbt 
instruction may only result in the fetch of a 32-byte sector (the one 
directly addressed by the EA). The other 32-byte sector in the cache 
line may or may not be fetched, depending on activity in the dynamic 
memory queue. 


Asuccessful debt instruction will affect the state of the TLB and 
cache LRU bits as defined by the LRU algorithm. 















Data The EAis the sum (rA]0)+(rB). 

ce The debtst instruction operates exactly like the debt instruction as 
es implemented on the MPC601. 

Touch for 

Store 
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Table 3-43. User-Level Cache Instructions (Continued) 


Operand , 
ee 


Cache Line This is a POWER instruction, and is not part of the PowerPC 
Compute architecture. This instruction will not be supported by other 
Size PowerPC implementations. 























This instruction places the cache line size specified by operand rA 
into register rD. The rA operand is encoded as follows: 


01100 Instruction cache line size (returns value of 64) 
01101 Data cache line size (returns value of 64) 
01110 Minimum line size (returns value of 64) 

01111 Maximum line size (return value of 64) 


All other encodings of the rA operand return undefined values. 
This instruction is specific to the MPC601. 

















Data The EA is the sum (rA|0)+(rB). 
Seay t If the block (the cache sector consisting of 32 bytes) containing the 
ra as : byte addressed by the EAis in the data cache, all bytes are cleared 


to 0. 


If the block containing the byte addressed by the EA is not in the data 
cache and the corresponding page is caching-allowed, the block is 
established in the data cache without fetching the block from main 
memory, and all bytes of the block are cleared to 0. 





















If the page containing the byte addressed by the EAis caching- 
inhibited or write-through, then the system alignment exception 
handler is invoked. 





If the block containing the byte addressed by the EA is in coherence 
required mode, and the block exists in the data cache(s) of any other 
processor(s), it is kept coherent in those caches. 


The debz instruction is treated as a store to the addressed byte with 
respect to address translation and protection. 


If the EA corresponds to an I/O controller interface segment 
(SR[T]=1), the debz instruction is treated as a no-op. 















Data 
Cache 
Block Store 






The EAis the sum(rA]0)+(rB). 


If the block (the cache sector consisting of 32 bytes) containing the 
byte addressed by the EA is in coherence required mode, and a block 
containing the byte addressed by the EA is in the data cache of any 
processor and has been modified, the writing of it to main memory is 
initiated. 





The function of this instruction is independent of the write-through 
and cache-inhibited/allowed modes of the block containing the byte 
addressed by the EA. 


This instruction is treated as a load from the addressed byte with 
respect to address translation and protection. 


If the EA corresponds to an I/O controller interface segment 
(SR[T]=1), the debst instruction is treated as a no-op. 
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Table 3-43. User-Level Cache Instructions (Continued) 


Operand ; 
jane | aamenc| ams | vomton 


Data The EAis the sum (rA|0)+(rB). 
Cache The action taken depends on the memory mode associated with the 
Block Flush target, and on the state of the block. The following list describes the 
action taken for the various cases, regardless of whether the page or 
block containing the addressed byte is designated as write-through or 
if it is in the caching-inhibited or caching-allowed mode. 

* Coherency Required (WIM = xx1) 



















— Unmodified Block—Invalidates copies of the block in the 
caches of all processors. 






— Modified Block—Copies the block to memory. 
Invalidates copies of the block in the caches of all 
processors. : 







— Absent Block—lf modified copies of the block are in the 
caches of other processors, causes them to be copied to 
memory and invalidated. If unmodified copies are in the 
caches of other processors, causes those copies to be 
invalidated. 











*  Coherency Not Required (WIM = xx0) 


— Unmodified Block—!nvalidates the block in the 
processor’s cache. 








— Modified Block—Copies the block to memory. 
Invalidates the block in the processor’s cache. 






Absent Block—Does nothing. 





3.8.3 Segment Register Manipulation Instructions 


The instructions listed in Table 3-44 provide access to the segment registers of the 
MPC6()1. These instructions operate completely independently of the MSR[IT] and 
MSR[DT] bit settings. Note that the rA operand is not defined for the mtsrin and mfsrin 
instructions in the MPC601. Refer to Section 2.3.3.1, “Synchronization for Supervisor- 
Level SPRs, and Segment Registers,” for serialization requirements and other 
recommended precautions to observe when manipulating the segment registers. 
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Table 3-44. Segment Register Manipulation Instructions 


Operand 
Roe (onl sn 


Move to The contents of rS is placed into segment register specified by 
Segment operand SR. 
Register 





















This is a supervisor-level instruction. 












Move to 
Segment 
Register 
Indirect 


The contents of rS are copied to the segment register selected by bits 
0-3 of rB. 


This is a supervisor-level instruction. 






Move from The contents of the segment register specified by operand SR are 
Segment placed into rD. 


Register This is a supervisor-level instruction. 







Move from 
Segment 
Register 
Indirect 







The contents of the segment register selected by bits 0-3 of rB are 
copied into rD. 


This is a supervisor-level instruction. 


3.8.4 Translation Look-Aside Buffer Management Instructions 


The MPC60)1 implements a TLB that caches portions of the page table. As changes are 
made to the address translation tables, the TLB must be updated. This is done by explicitly 
invalidating TLB entries (both in the set) with the Translation Lookaside Buffer Invalidate 
Entry (tlbie) instruction. 


Because the presence, absence, and exact semantics of various translation lookaside buffer 
management instructions are implementation dependent, system software should 
encapsulate uses of such instructions into subroutines to minimize the impact of migrating 
from one implementation to another. 


3.9 External Control Instructions 


The external control instructions provide a means for a user-level program to communicate 


with a special-purpose device. Two instructions are provided and are summarized in 
Table 3-46. 
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Table 3-45. Translation Lookaside Buffer Management Instruction 


Operand . 
fone [atmmente | Sent | vento 


Translation The effective address is the contents of rB. If the TLB contains an 
Lookaside entry corresponding to the EA, that entry is removed from the TLB. 
Buffer The TLB search is done regardless of the settings of MSR[IT] and 
invalidate MSRIDT]. Also, a TLB invalidate operation is broadcast on the 
system bus. 



















Block address translation for the EA, if any, is ignored. 






If the corresponding segment register for the EA specifies T=1 (an /O 
controller interface segment), no TLB entry invalidation is performed 
on the local processor and no TLB invalidate is broadcast. 








Because the MPC601 supports broadcast of TLB entry invalidate 
operations, the following must be observed: 






* The tlbie instruction must be contained in a critical section of 
memory controlled by software locking, so that the tlbie is issued 
on only one processor at a time. 

¢ Async instruction must be issued after every tlbie and at the end 
of the critical section. This causes hardware to wait for the effects 
of the preceding tibie instructions(s) to propagate to all 
processors. 


A processor detecting a TLB invalidate broadcast does the following: 











1. Prevents execution of any new load, store, cache control or 
tlbie instructions and prevents any new reference or change 
bit updates 


2. Waits for completion of any outstanding memory operations 
(including updates to the reference and change bits 
associated with the entry to be invalidated) 











3. Invalidates the two entries (both associativity classes) in the 
UTLB indexed by the matching address 


4. Resumes normal execution 











This is a supervisor-level instruction. 


Software must ensure that SDR 1 points to the page table when 

issuing tibie, even when address translation is disabled. Nothing is 
guaranteed about instruction fetching in other processors if tlbie 
deletes the page in which another processor is executing. 








3.10 Miscellaneous Simplified Mnemonics 


In order to make assembly language programs simpler to write and easier to understand, a 
set of simplified mnemonics are provided that define a shorthand for some of the most 
frequently used instructions. PowerPC compliant assemblers provide the simplified 
mnemonics listed here, and in the sections describing the branch, arithmetic, compare, trap, 
rotate and shift, and move to/from special purpose register instructions. Programs written 
to be portable across the various assemblers for the PowerPC architecture should not 
assume the existence of mnemonics not defined in this user’s manual. 





MOTOROLA Chapter 3. Addressing Modes and Instruction Set Summary 3-91 


Table 3-46. External Control Instructions 


Operand : 
a eee 


External rD,rA,rB The EAis the sum (rA]0)+(rB). 

eile If the external access register (EAR) E-bit (bit 0) is set to 1, aload 

ae d request for the physical address corresponding to the EAis sent to 

xe the device identified by the EAR Resource ID bits (bits 26-31), 

bypassing the cache. The word returned by the device is placed in 
rD. The EA sent to the device must be word aligned. 
If the EAR[E]=0, a data access exception is invoked, with bit 11 of 
DSISR set to 1. : 
The eciwx instruction is supported for EAs that reference ordinary 
memory segments (SR[T]=0), for EAs mapped by BAT registers, and 
for EAs generated when MSR[DT]=0.The instruction is treated as a 
no-op for EAs in I/O controller interface segments (SR[T]=1). 
The access caused by this instruction is treated as a load from the 
location addressed by the EA with respect to protection and 
reference and change recording. 


3.10.1 No-op 


Many PowerPC instructions can be coded in a way such that, effectively, no operation is 
performed. An additional mnemonic is provided for the preferred form of no-op. If an 
implementation performs any type of run-time optimization related to no-ops, the preferred 
form is the no-op that will trigger this. 

























External The EAis the sum (rA|0)+(rB). 

Control out . Tee Re 

Word If the External Access Register (EAR) E-bit (bit 0) is set to 1, a store 
ae ‘ request for the physical address corresponding to the EA and the 


contents of rS are sent to the device identified by EAR[RID] (resource 
ID) (bits 26-31), bypassing the cache. The EA sent to the device must 
be word aligned. 


If the EAR[E]=0, a data access exception is invoked, with bit 11 of 
DSISR set to 1. 


The ecowx instruction is supported for EAs that reference ordinary 
memory segments (SR[T]=0), for EAs mapped by BAT registers, and 
for EAs generated when MSR[DT]=0.The instruction is treated as a 
no-op for EAs in I/O controller interface segments (SR[T]=1). 





The access caused by this instruction is treated as a load from the 
location addressed by the EA with respect to protection and 
reference and change recording 


no-op (equivalent to ori 0,0,0) 


3.10.2 Load Immediate 


The addi and addis instructions can be used to load an immediate value into a register. 
Additional mnemonics are provided to convey the idea that no addition is being performed 
but that data us being moved from the immediate operand of the instruction to a register. 
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Load a 16-bit signed immediate value into rA: 
lirA,value (equivalent to addi rA,0,value) 


Load a 16-bit signed immediate value, shifted left by 16 bits, into rA: 
lis rA,value (equivalent to addi rA,0,value) 


3.10.3 Load Address 


This mnemonic permits computing the value of a base-displacement operand, using the 
addi instruction which normally requires a separate register and immediate operands. 


la rD,SIMM(rA) (equivalent to addi rD,rA,SIMM) 


The la mnemonic is useful for obtaining the address of a variable specified by name, 
allowing the assembler to supply the base register number and compute the displacement. 
If the variable v is located at offset SIMMyv bytes from the address in register rv, and the 
assembler has been told to use register rv as a base for references to the data structure 
containing v, then the following line causes the address of v to be loaded into register rD. 


la rD,v (equivalent to addi rD,rA,SIMMy 


3.10.4 Move Register 


Several PowerPC instructions can be coded to simply copy the contents of one register to 
another. An extended mnemonic is provided to move data from one register to another with 
no computational activity. 


The following instruction copies the contents of register rS into register rA. This 


mnemonic can be coded with a “.” to cause the condition register update option to be 
specified in the underlying instruction. 


mr rA,rS (equivalent to or rA,rS,rB) 


3.10.5 Complement Register 


Several PowerPC instructions can be coded to complement the contents of one register and 
place the result in another register. A simplified mnemonic is provided that complements 
the contents of rS and places the results into register rA. This mnemonic can be coded with 
a “.” to cause the condition register update option to be specified in the underlying 
instruction. 


not rA,rS (equivalent to nor rA,rS,rB) 
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Chapter 4 
Cache and Memory Unit Operation 


The MPC601 contains a 32-Kbyte, eight-way set associative, unified (instruction and data) 
cache. The cache line size is 64 bytes, divided into two eight-word sectors, each of which 
can be snooped, loaded, cast-out, or invalidated independently. The cache is designed to 
adhere to a write-back policy, but the MPC601 allows control of cacheability, write policy, 
and memory coherency at the page and block level. The cache uses a least recently used 
(LRU) replacement policy. 


The MPC6(0)1’s on-chip cache is non-blocking. Burst operations to the cache are buffered 
such that the cache update is reduced to two single-cycle operations of four words. That is, 
the results of the first two and the last two bursts are buffered and written to the cache in 
single cycles apiece. This frees the cache to perform lower priority operations in the 
meantime. 


System operations, including cache operations, connect to the system interface through the 
memory unit, which includes a two-element read queue and a three-element write queue. 


The cache provides an eight-word interface to the rest of the device. The surrounding logic 
selects, organizes, and forwards the requested information to the requesting unit. Write 
operations to the cache can be performed on a byte basis, and a complete read-modify-write 
operation to the cache can occur in each cycle. 


The cache unit and the memory unit coordinate cache reload and cast-out operations so that 
a cache miss does not block the use of the cache for other operations during the next cycle. 
Cache reload operations always occur on a sector basis, with the option of reloading the 
additional sector as a low-priority operation. On loads and fetch operations, the critical data 
is forwarded to the requesting unit without waiting for the entire cache line to be loaded. 


The MPC60)1 maintains cache coherency in hardware by coordinating activity between the 
cache, the memory unit, and the bus interface logic. As bus operations are performed on the 
bus by other devices, the MPC6()1 bus snooping logic monitors the addresses that are 
referenced. These addresses are compared with the addresses resident in the cache. The 
cache unit uses a second port into its tag directory to check for a matching entry and the 
memory queue unit does the same. If there is a snoop hit, the MPC6(0)1’s bus snooping logic 
responds to the bus interface with the appropriate snoop status. An additional snoop action 
may be forwarded to the cache or to the memory unit as a result of a snoop hit. 
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Note that in this chapter the term multiprocessor is used in the context of maintaining cache 
coherency, although the system could include other devices that can access system memory, 
maintain their own caches, and function as bus masters requiring cache coherency. 


This chapter describes the organization of the MPC601’s on-chip cache, the MESI cache 
coherency protocol, special concerns for cache coherency in single- and multiple-processor 
systems, cache control instructions, various cache operations, and the interaction between 
the cache and the memory unit. 


4.1 Cache Organization 


The cache is configured as eight sets of 64 lines. Each line consists of two sectors, four state 
bits (two per sector), an address tag, and several bits to maintain the LRU function. The two 
state bits implement the four-state MESI (modified-exclusive-shared-invalid) protocol. 
Each sector contains eight 32-bit words. Note that PowerPC architecture defines the 


. cacheable unit as a block, which is a sector in the MPC601. 


The instruction unit accesses the cache frequently in order to maintain the flow of 
instructions through the instruction queue. The queue is eight words (one sector) long, so 
an entire sector can be loaded into the instruction unit on a single clock cycle. 


The cache organization is shown in Figure 4-1. Note that the replacement algorithm is 
strictly an LRU algorithm; that is, the least recently used sector is used, which may mean 
that a modified sector will be replaced on a miss if it is the least recently used, even if 
invalid sectors are available. However, for performance reasons, certain conditions (for 
example, the execution of some cache instructions) generate accesses to the cache without 
modifying the bits that perform the LRU function. 


Each cache line contains 16 contiguous words from memory that are loaded from a 
16-word boundary (that is, bits A26—A31 of the logical addresses are zero); as a result, 
cache lines are aligned with page boundaries. 


Note that address bits A20—A25 provide an index to select a line. Bits A26—A31 select a 
byte within a line. The tags consists of bits PAQ-PA19. Address translation occurs in 
parallel, such that higher-order bits (the tag bits in the cache) are physical. 
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Figure 4-1. Cache Organization 


4.2 Cache Arbitration 


The instruction unit and the integer unit both access the cache; however, the cache unit 
handles only one access per cycle. Furthermore, since the cache is nonblocking, a 
preceding cache operation may generate a cache reload operation which must also compete 
for cache access. The bus snooping logic may create additional snoop actions that use the 
cache. The MPC601 efficiently handles simultaneous requests to access the on-chip cache. 


The MPC601 implements cache arbitration logic to prioritize the various cache requests 
that can occur on each cycle. The cache unit provides a cache retry queue (CRTRY) if a 
caching operation cannot be completed. There are three entries in this queue, providing a 
buffer for one outstanding floating-point store, one integer store, and one instruction fetch. 
Priority is given first to floating-point stores, then to integer stores, and finally to instruction 
fetches. 


A Similar situation arises with respect to the bus. Internal bus arbitration logic chooses the 
highest priority operation from the memory queue for presentation onto the bus. These 
priorities are listed in Section 4.10.2, “Memory Unit Queuing Priorities.” 
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The MPC601 supports a fully-coherent 4-Gbyte physical memory address space. Bus 
snooping is used to drive a MESI four-state cache-coherency protocol that ensures the 
coherency of all processor and DMA transactions to and from global memory with respect 
to the processor's cache. The MESI protocol is described in Section 4.7.2, “MESI 
Protocol.” All potential bus masters must employ similar snooping and coherency-control 
mechanisms, 


4.3 Cache Access Priorities 
The MPC601 prioritizes pending cache operations as follows: 


1. Cache reloads. Note that the cache is non-blocking. Four-beat burst reloads on the 
system bus are buffered into two, single- cycle transactions of four words each, 
freeing the cache to perform lower priority operations in the meantime. 


2. Second-cycle cast-out operations when the additional sector is modified 


3. Snoop requests that hit in the tag directory. These generate a cache sector push 
operation. 


4. Floating-point store operations 


5. Integer operation retries. If a higher priority operation occurs when an integer 
operation is ready to cache its results, the results are held in a buffer until the higher 
priority operation completes, then it is retried on the next clock cycle. This prevents 
the integer unit‘from stalling when this situation occurs. 

6. Integer unit requests 


7. Instruction fetches 


4.4 Basic Cache Operations 


This section describes operations that can occur to the cache, and how these operations are 
implemented in the MPC601. 


4.4.1 Cache Reloads 


A cache sector is reloaded after a read miss occurs in the cache. The cache sector that 
contains the address is updated by a burst transfer of the data from system memory. Note 
that if a read miss occurs in a multiprocessor system, and the data is modified in another 
cache, the modified data is first written to external memory before the cache reload occurs. 


An instruction prefetch that is generated to fill the instruction queue (not explicitly required 
by the program flow) does not generate a reload operation in the case of a cache miss. 


4.4.2 Cache Cast-Out Operation 


The MPC601 uses an LRU replacement algorithm to determine which of the eight possible 
cache locations should be used for a cache update. Adding a new sector to the cache causes 
any modified data associated with the least recently used element to be written back, or cast 
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out, to system memory. This includes both sectors of the line, even though only one sector 
may be reloaded. Casting out of the adjacent sector is referred to as a second-cycle cast-out 
operation. 


4.4.3 Cache Sector Push Operation 


When a cache sector in the MPC601 is snooped and hit by another processor and the data 
is modified, the cache sector must be written to memory and made available to the snooping 
device. The cache sector that is hit is said to be pushed out onto the bus. The MPC601 
supports two kinds of push operations—normal push operations and enveloped high- 
priority push operations, which are described in Section 4.7.11, “Enveloped High-Priority 
Cache Sector Push Operation.” 


4.4.4 Optional Cache Sector Line-Fill Operation 


The two sectors in a cache line contain contiguous memory addresses; therefore, the two 
sectors share the same line address tag. Cache coherency, however, is maintained on a 
sector granularity, so there are separate coherency state bits for each sector. If one sector of 
the line is filled from memory, the MPC601 may attempt to load the other sector as a low- 
priority bus operation. 


If the sector is not transferred, the cache line in the snooping processor contains one sector 
that is in the shared state (the one that was transferred because of the snoop hit) and one 
sector that is invalid (if the optional cache line fill is not performed). 


Note that the optional reload of an adjacent sector on an instruction fetch miss can be 
disabled globally by.setting bit 26 in the HIDO register, and the optional reload of the 
adjacent sector on a load/store miss can be disabled by setting bit 27. 


4.5 Cache Data Transactions 


The MPC601 output signal TBST (transfer burst) indicates to the system whether the 
current transaction is a single-beat transaction or four-beat burst transfer. Burst transactions 
have an assumed address order. For cacheable load operations or cacheable, non-write- 
through store operations that miss the cache, the MPC601 presents the quad-word aligned 
address associated with the read or store that initiated the transaction. (Note that for 
optimizing programs to be used with subsequent PowerPC processors, programs should be 
double-word aligned.) 


As shown in Figure 4-2, this quad-word contains the address of the load or store that missed 
the cache. This minimizes latency by allowing the critical code or data to be forwarded to 
the processor before the rest of the sector is filled. For all other burst operations, however, 
the entire sector is transferred in order (oct-word aligned). 
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MPC601 Cache Address 
Bits (27..28) 
00 01 10 


If address requested is in double word A or B then the address placed on the bus are that of 
quad-word A, and the four data beats are ordered in the following manner: 


Beat 
0 3 
i ee wt Wie a Se we el 


If address requested is in double word C or D then the address placed on the bus will be that 
of quad-word C, and the four data beats are ordered in the following manner: 


Beat 
1 2 


3 
ae Se | re: See Ree ae 


Figure 4-2. Quad-Word Address Ordering 


i 


4.6 Access to I/O Controller Interface Segments 
The MPC601 supports two kinds of operations that involve the I/O controller interface: 


¢ 1/Ocontroller interface operations. These operations are considered to address the 
noncoherent and noncacheable I/O controller interface; therefore, the MPC601 does 
not maintain coherency for these operations, and the cache is bypassed completely. 


* Memory-forced I/O controller interface operations. These operations are considered 
to address memory space and are therefore subject to the same coherency control as 
memory accesses. These operations are global memory references within the 
MPC6()1 and are considered to be noncacheable and write-through. 


Cache behavior (write-back, cache-inhibition, and enforcement of MESI coherency) for 
these operations is determined by the settings of the WIM bits. See 6.3, ‘““Memory/Cache 
Access Modes.” 


4.7 Cache Coherency 


The primary objective of a coherent memory system is to provide the same image of 
memory tc all devices using the system. Coherency allows synchronization, cooperative 
use of shared resources, and task migration among the processors. Otherwise, for example, 
a device performing a store operation would require exclusive access to the addressed 
sector before making an update to prevent another device from using stale data. 
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Each potential bus master must follow rules for managing the state of its cache. For 
example, a device must broadcast its intention to read a sector that is not currently in the 
cache. It must also broadcast the intention to write into a sector that is currently not owned 
exclusively. Other devices respond to these broadcasts by snooping their caches for the 
broadcast addresses and reporting status back to the originating device. The status returned 
includes a shared indicator (another device has a copy of the addressed sector) and a retry 
indicator (another device either has a modified copy of the addressed sector that it needs to 
push out of the chip, or another device had a problem that prevented appropriate snooping). 


For faster performance, the MPC6()1 has a second path into the cache directory so snooping 
and mainstream instruction processing occur concurrently. Instruction processing is 
interrupted only when the snoop control logic detects a state change or that a snoop push 
of modified data is required to maintain memory coherency. 


To maintain coherency, secondary caches must forward all relevant system bus traffic onto 
the MPC601 bus, which takes the appropriate actions to maintain the MESI protocol. 


Support for lwarx and stwex. instructions on noncacheable pages may be somewhat more 
complicated for a secondary cache than normal cacheable memory accesses. This is 
because the secondary cache may not normally forward writes to noncacheable pages in the 
processor. However, to maintain the reservation coherency bit, the secondary cache must 
know to forward all writes that hit against a specified address. 


4.7.1 Memory Management Access Mode Bits—W, I, and M 


Some memory characteristics can be set on either a block or page basis by using the WIM 
bits in the BAT registers or page table entry (PTE) respectively. The WIM bits control the 
following functionality: 


¢ Write-through (W bit) 
¢ Caching-inhibited (I bit) 
* Memory coherency (M bit) 


These bits allow both single- and multiprocessor-system designs to exploit numerous 
system-level performance optimizations. These bits are described in detail in Chapter 2, 
“Registers and Data Types,” and Chapter 6, “Memory Management Unit.” Using these bits 
carelessly can cause coherency problems—such as when flushing pages that correspond to 
the changed WIM bits from the caches of all devices in the system or when the address 
translations of aliased physical addresses specify different values for any of the WIM bits. 
The MPC601 considers either of these cases to be a programming error that may 
compromise memory coherency. These paradoxes can occur within a single processor or 
across several devices, as described in Section 4.7.5.1, “Coherency in Single-Processor 
Systems,” and Section 4.7.5.2, “Coherency in Multiprocessor Systems.” 
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4.7.2 MESI Protocol 


The MPC601 cache characterizes each 32-byte sector it contains as being in one of four 
MESI states. Addresses presented to the cache are indexed into the cache directory with bits 
A20-—A25 and the upper-order 20 bits from the physical address translation (PAQ-PA19) are 
compared against the indexed cache directory tags. If no tags match, the result is a cache 
miss. If a tag matches, a cache hit occurred and the directory indicates the state of the sector 
through two state bits kept with the tag. The four possible states for a sector in the cache 
are the invalid state (I), the shared state (S), the exclusive state (E), and the modified state 
(M). The four MESI states are defined in Table 4-1 and illustrated in Figure 4-3. 


Table 4-1. MESI State Definitions 


Modified (M) | The addressed sector is valid in the cache and in only this cache. The sector is modified with 
respect to system memory—that is, the modified data in the sector has not been written back to 
memory. 


Exclusive (E) | The addressed sector is in this cache only. The data in this sector is consistent with system 
memory. 


Shared (S) The addressed sector is valid in the cache and in at least one other cache. This sector is always 
consistent with system memory. That is, the shared state is shared-exclusive; there is no shared- 
modified state. 


Invalid (I) This state indicates that the addressed sector is not resident in the cache. 
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Modified in Cache A Shared in Cache A 


Cache B Cache A Cache B 





System Memory 
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Exclusive in Cache A Invalid in Cache A 





Cache A Cache B Cache A Cache B 













Figure 4-3. MESI States 


4.7.3 MESI State Diagram 


The MPC601 provides dedicated hardware to provide memory coherency by snooping bus 
transactions. The address retry capability of the MPC601 enforces the MESI protocol, as 
shown in Figure 4-4. Figure 4-4 assumes that the WIM bits are set to 001; that is, write- 
back, caching-not-inhibited, and memory coherency enforced. 





MOTOROLA Chapter 4. Cache and Memory Unit Operation 4-9 


Table 4-7 gives a detailed list of MESI transitions for various operations and WIM bit 
settings. 


SHR 


SHW 





Qe 
m2) 
SH 


RH 





WH 


BUS TRANSACTIONS 


RH = Read Hit @ = Snoop Push 
RMS = Read Miss, Shared 
RME = Read Miss, Exclusive ®) = Invalidate Transaction 
WH = Write Hit 

WM = Write Miss ae) = Read-with-Intent-to-Modify 
SHR = Snoop Hit on a Read 
SHW = Snoop Hit on a Write or @) = Cache Sector Fill 

Read-with-Intent-to-Modify 


Figure 4-4. MESI Cache Coherency Protocol—State Diagram (WIM = 001) 


4.7.4 MESI Hardware Considerations 


In addition to the hardware required to monitor bus traffic for coherency, the MPC601 has 
a cache port dedicated to snooping so that comparing cache entries to address traffic on the 
bus does not affect the MPC6()1's on-chip cache. 


The global (GBL) signal, asserted as part of the address attribute field, enables the snooping 
hardware of the MPC601. Address bus masters assert GBL to indicate that the current 
transaction is a global access (that is, an access to memory shared by more than one device). 
If GBL is not asserted for the transaction, that transaction is not snooped. 
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Normally, GBE reflects the M-bit value specified for the memory reference in the 
corresponding translation descriptor(s). Care must be taken to minimize the number of 
pages marked as global, because the retry protocol enforces coherency and can use 
considerable bus bandwidth if much data is shared. Therefore available bus bandwidth can 
decrease as more traffic is marked global. Note that in Figure 4-4, write hits to unmodified 
lines of nonglobal pages do not generate invalidate broadcasts. 


The MPC601 snoops a transaction if the transfer start (TS) and GBL inputs are asserted 
together in the same bus clock (this is a qualified snooping condition). No snoop update to 
the MPC60)1 cache occurs if the snooped transaction is not marked global. This includes 
invalidation cycles. 


When the MPC601 detects a qualified snoop condition, the address associated with the TS 
is compared with the cache tags through a dedicated cache-tag snoop port. Snooping 
tinishes if no hit is detected. If, however, the address hits in the cache, the MPC6(0)1 reacts 
according to the MES] protocol shown in Figure 4-4. 


Because they do not require snooping, cache sector cast-outs, snoop pushes, and table- 
search operations do not assert GBL. The MPC601 marks these transactions as nonglobal. 


To facilitate external monitoring of the internal cache tags, the cache set member signals 
(CSEO-CSE2) represent in binary the sector of the cache set being replaced on read 
operations (including read-with-intent-to-modify operations). This does not apply and is 
not necessary for write operations to memory. Note that these signals are valid only for 
MPC601 burst operations. Table 6-2 shows the CSE encodings. 


Table 4-2. CSEO—-CSE2 Signals 


ee 
011 


Set 3 


[on [sas | 


4.7.5 Coherency Precautions 


Cache coherency is greatly affected by whether the MPC6()1 is used in a single- or 
multiple-processor implementation. This section describes precautions for implementing 
coherent single- and multiple-processor systems. 
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4.7.5.1 Coherency in Single-Processor Systems 
The following situations concerning coherency can be encountered within a single 
processor implementation: 


¢ Load or store to a cache-inhibited page (WIM = b'X1X’) and a cache hit occurs 


Caching is inhibited for this page (I = 1). Load or store operations to a cache- 
inhibited page that hit in the cache cause a paradox. If the addressed sector is not 
modified, the MPC601 invalidates the sector and performs the memory access. If the 
addressed sector in the cache line is modified, the MPC6()1 flushes the modified 
sector before accessing memory. 


¢ Store to a page marked write-through (WIM = b'10X’) and a cache hit to a modified 
sector 


This page is marked as write-through (W = 1). Store operations to a write-through 
page that hit a modified sector are considered coherency paradoxes by the processor. 
The MPC601 pushes the modified sector to memory and marks the sector exclusive 
(E). Then the MPC601 writes the data into the cache, marking it exclusive and 
passing on a write-with-flush operation (to the memory queue). 


Note that when WIM bits are changed, it is critical that the cache contents should reflect the 
new WIM bit settings. For example, if a block or page that had allowed caching becomes 
caching-inhibited, the appropriate cache sectors should be updated to leave no indication 
that caching had previously been allowed. 


4.7.5.2 Coherency in Multiprocessor Systems 


Other situations concerning coherency can occur across multiple processors (or systems 
that employ multiple devices that incorporate caches). Paradoxes in multiprocessor 
systems are particularly difficult to handle since some scenarios cause modified data to be 


purged and others may lead to bus deadlock scenarios. , 


Most multiprocessor paradoxes center around the interprocessor coherency of the memory 
coherency bit (the M bit). Improper use of the M bit can lead to multiple devices accepting 
a cache sector and marking the data as exclusive, leading to the possibility of the same 
cache line being modified in multiple caches. 
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Although these coherency paradoxes are considered programming errors, the MPC6()1 
attempts to handle the offending conditions and minimize the negative effects on memory, 
coherency. Note that the intent of this effort is to ease the debugging of multiprocessor 
operating system development. The following lists some of the operations provided by the 
MPC601: 


¢ Noncacheable write operations appear on the processor bus as write-with-flush 
operations, which forces other processors with modified copies of the addressed 
sector to write data back to memory and to mark the sector as invalid in the cache. 
Devices with an unmodified copy of the sector must mark the sector as invalid in 
their caches. 

¢ All noncacheable read operations appear on the MPC601 bus as read (with clean) 
operations, which forces processors with modified copies of the addressed data to 
write the data back to memory before the read operation completes. 


Note that when WIM bits are changed, it is critical that the cache contents should reflect the 
new WIM bit settings. For example, if a block or page that had allowed caching becomes 
caching-inhibited, the appropriate cache sectors should be updated to leave no indication 
that caching had previously been allowed. 


Additional information on bus operations that are generated for specific instructions and 
state conditions can be found in Chapter 9, “System Interface Operation.” 


4.7.6 Memory Loads and Stores 
Table 4-3 provides a general overview of memory coherency actions performed by the 
MPC6()1on load operations. 


Table 4-3. Memory Coherency Actions on Load Operations 


Bus | raion | 


sees me 
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Noncacheable cases are not part of this table. The first three cases also involve selecting a 
replacement class and casting-out modified data that may have resided in that replacement 
class. 







Table 4-4 provides an overview of memory coherency actions on store operations. This 
table does not include noncacheable or write-through cases nor does it completely describe 
the exact mechanisms for the operations described. It describes generally what happens 
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within the chip. The read-with-intent-to-modify (RWITM) examples involve selecting a 
replacement class and casting-out modified data that may have resided in that replacement 
class. 


Table 4-4. Memory Coherency Actions on Store Operations 


Bus 


Negated Load data, modify it, mark M 





4.7.7 Atomic Memory References 


The Iwarx/stwex. instruction combination can be used to perform atomic memory 
references. These instructions are described in Chapter 3, “Addressing Modes and 
Instruction Set Summary,” and Chapter 10, “Instruction Set.” 


4.7.8 Snoop Response to Bus Operations 


When the MPC601 is not the bus master, it monitors bus traffic and performs cache and 
memory-queue snooping as appropriate. The snooping operation is triggered by the receipt 
of a qualified snoop request. A qualified snoop request is generated by the simultaneous 
assertion of the TS and GBE bus signals. 


Instruction processing is interrupted only when a snoop hit occurs and the snoop state 
machine determines that an additional cache snoop is required to resolve the coherency of 
the offended sector. 


The MPC601 maintains a write queue of bus operations in progress and/or pending 
arbitration. This write queue must also be snooped in response to qualified snoop requests. 


The MPC601 drives two snoop status signals (ARTRY and SHD) in response to a qualified 
snoop request that hits. These signals provide information about the state of the addressed 
sector for the current bus operation. These signals are described fully in Chapter 8, “Signal 


- Descriptions.” 


4.7.9 Cache Reaction to Specific Bus Operations 


There are several bus transaction types defined for the MPC601 bus. The MPC60)1 must 
snoop these transactions and perform the appropriate action to honor their intention to 
maintain memory coherency; see Table 4-5. 
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A processor may assert ARTRY for any bus transaction due to internal conflicts that prevent 
the appropriate snooping. In general, if ARTRY is not asserted, each snooping processor 
must take full ownership for the effects of the bus transaction with respect to the state of 
the processor. The processor can assert ARTRY if an internal conflict prevents it from 
snooping properly. 


The transactions in Table 4-5 correspond to the transfer type signals TTO-TT4, which are 
described in Section 8.2.4.1, “Transfer Type (TTO-TT4).” 


Table 4-5. Response to Bus Transactions 


Transaction Response 


Clean block The clean operation is an address-only bus transaction, initiated by executing a dcbst 
instruction. This operation affects only sectors marked as modified (M). Assuming the 
GEL signal is asserted, modified sectors are pushed out to memory, changing the state 
to E. 


Flush block The flush operation is an address-only bus transaction initiated by executing adcbf 
instruction. Assuming the GBL signal is asserted, the flush block operation results in the 
following: 
¢ Ifthe addressed sector is shared or exclusive, an additional snoop action is 

generated internally that invalidates the addressed sector. 

+ If the addressed sector is in the M state, ARTRY is asserted and an additional 
internally generated snoop action is initiated that pushes the modified sector out of the 
cache and invalidates the sector. 

+ If HIDO[31] = 0, and any bus read operation is pending during this snoop operation, 
the write-back of the modified sector is considered to be a high-priority bus operation 
that may be enveloped within the pending load operation. 

* If HIDO[31} = 1, and any bus read operation with HP_SNP_REQ asserted is pending 

during this snoop operation, the write-back of the modified sector is considered to be a 

high-priority bus operation that may be enveloped within the pending load operation. 

¢ If the addressed sector hits any of the three entries in the write queue, that entry is 
tagged as a high-priority push, after which it can be loaded from memory. 


Write with flush Write-with-flush and write-with-flush-atomic operations occur after the processor issues 
Write with flush atomic a store or stwex. instruction, respectively. 

* If the addressed sector is in the shared or exclusive state, an additional snoop action 
is generated internally that forces the state of the addressed sector to invalid. 

+ If the addressed sector is in the modified state, the ARTRY is asserted and an 
additional, internally generated snoop action is initiated that pushes the modified 
sector out of the cache and changes the state of the sector to invalid. 

+ If HIDO[31] = 0, and any bus read operation is pending during this snoop operation, 
the write-back of the modified sector is considered to be a high-priority bus operation 
that may be enveloped within the pending load operation. 

* If HIDO[31] = 1, and any bus read operation with HP_SNP_REG asserted is pending 

during this snoop operation, the write-back of the modified sector is considered to be a 

high-priority bus operation that may be enveloped within the pending load operation. 

* If the addressed sector hits any of the three entries in the write queue, that entry is 
tagged as a high-priority push operation. 
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Table 4-5. Response to Bus Transactions (Continued) 


Transaction Response 


Kill block The kill-block operation is an address-only bus transaction initiated when one of the 
following occurs: 

¢ a dcbi instruction is executed 

* a dcbz operation to a block marked S or | is executed 

* awrite operation to a block marked S occurs 


If a snoop hit occurs, an additional snoop is initiated internally and the sector is forced to 
the I state, effectively killing any modified data that may have been in the sector. The 
three-entry write queue is also snooped, and if a queue entry hits, itis purged. 


















Write with kill In a write-with-kill operation, the processor eventually snoops the cache for a copy of 
the addressed sector. If one is found, an additional snoop action is initiated internally 
and the sector is forced to the | state, killing modified data that may have been in the 
sector. In addition to snooping the cache, the three-entry write queue is also snooped. A 


kill operation that hits an entry in the write queue purges that entry from the queue. 



































Read 
Read atomic 


The read operation is used by most single- and burst reads on the bus. Aread on the 

bus with the GEL signal asserted causes the following responses: 

* If the addressed sector is in the cache but is invalid, the MPC601 takes no action. 

* If the sector is in the shared state, the MPC601 asserts the shared snoop status 
indicator. 

+ If the sector is in the E state, the MPC601 asserts the shared snoop status indicator 
and initiates an additional snoop action to change the state of that sector from E to S. 

* If the sector is in the cache in the M state, the MPC601 asserts both the ARTRY and 
the SHD snoop status signals. It also initiates an additional snoop action to push the 
modified sector out of the chip and to mark that cache sector as shared. 


Read atomic operations appear on the bus in response to lwarx instructions and 
generate the same snooping responses as read operations. 










Read with intent to modify 
(RWITM) 
RWITM atomic 


An RWITM operation is issued to acquire exclusive use of a memory location for the 

purpose of modifying it. 

« If the addressed sector is in the | state, the MPC601 takes no action. 

+ If the addressed sector is in the cache and in the S or E state, the MPC6071 initiates an 
additional snoop action to change the state of the cache sector to I. 

¢ If the addressed sector is in the cache and in the M state, the MPC601 asserts both 
the ARTRY and the SHD snoop status signals. It also initiates an additional snoop 
action to push the modified sector out of the chip and to change the state of that sector 
in the cache from M to l. 


The RWITM atomic operations appear on the bus in response to stwex. instructions 
and are snooped like RWITM instructions. 


The sync instruction causes an address-only bus transaction. The MPC601 asserts the 
ARTRY snoop status if there are any TLB-related snoop operations pending in the chip. 
This transaction is also generated by the eieio instruction on the MPC601. 





TLB invalidate ATLB invalidation operation is caused by executing a tlbie instruction. This instruction 
transmits the MPC601's TLB index (bits 12-19 of the EA) onto the system bus. Other 


processors on the bus invalidate TLB entries associated with EAs that match those bits. 










V/O reply The I/O reply operation is part of the /O controller interface operation. It serves as the 
final bus operation in the series of bus operations that service an I/O controller interface 


operation. 
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4.7.10 Internal ARTRY Scenarios 
The following scenarios, along with others, cause the MPC601 to assert the ARTRY signal. 
« Snoop hits to a sector in the M state (optional on kill requests) 
¢ Snoop hits when a reload dump request is active 
¢ Snoop hits on a valid (that is, not cancelled) operation that is queued internally. 
¢ Snoop hits while a cast-out request is pending during this or the next clock cycle. 


4.7.11 Enveloped High-Priority Cache Sector Push Operation 


If the MPC601 has a read operation outstanding on the bus and another pipelined bus 
operation hits against a modified sector, the MPC601 provides a high-priority push 
operation. This transaction is enveloped within the address and data tenures of a read 
operation. This feature prevents deadlocks in system organizations that support multiple 
memory-mapped buses. More specifically, the MPC601 internally detects the scenario 
where a load request is outstanding and the processor has pipelined a write operation on top 
of the load. Normally, when the data bus is granted to the MPC601, the resulting data bus 
tenure is used for the load operation. The enveloped high-priority cache sector push feature 
defines a bus signal, the data bus write only qualifier (DBWO), which, when asserted with 
a qualified data-bus grant, indicates that the resulting data tenure should be used for the 
store operation instead. This signal is described in Section 9.10, “Using DBWO (Data Bus 
Write Only.” DBWO asserted at any other time is considered a no-op to the MPC601 with 
respect to the ordering of the data bus tenures of pipelined bus operations. Note that the 
enveloped copy-back operation is an internally pipelined bus operation. 


4.8 Cache Control Instructions 


Software must use the appropriate cache management instructions to ensure that caches are 
kept consistent when data is modified by the processor or by input data transfer. When a 
processor alters a memory location that may be contained in an instruction cache, software 
must ensure that updates to memory are visible to the instruction fetching mechanism. 
Although the instructions to enforce coherency vary among implementations and hence 
many operating systems will provide a system service for this function, the following 
sequence is typical: 

1. dcbst (update memory) 

2. sync (wait for update) 

3. icbi (invalidate copy in cache) 

4, isyne (invalidate copy in own instruction buffer 
These operations are necessary because the memory may be in write-back mode. Since 


instruction fetching may bypass the data cache, changes made to items in the data cache 
may not be reflected in memory until after the instruction fetch completes. 
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The PowerPC architecture defines instructions for controlling both the instruction and data 
caches. Instruction cache control instructions are valid instructions on the MPC601, but 
may function differently than they do when used on PowerPC processors that have separate 
instruction and data caches. 


Data caches and unified caches must be kept consistent with other data caches, combined 
caches, memory, and I/O data transfers. However, to ensure consistency, aliased effective 
addresses (two effective addresses that map to the same physical address) must have the 
same page attributes (WIM bits). 


Note that in the PowerPC architecture, the term cache block, or simply block when used in 
the context of cache implementations, refers to the unit of memory at which coherency is 
maintained. For the MPC601 this is the eight-word sector. This value may be different for 
other PowerPC implementations. In-depth descriptions of coding these instructions is 
provided in Chapter 3, “Addressing Modes and Instruction Set Summary,” and Chapter 10, 
“Instruction Set.” 


4.8.1 Cache Line Compute Size Instruction (clcs) 

The cles instruction places the cache information specified in the instruction into a target 
register. This instruction is used by the POWER architecture to determine the maximum 
and minimum line sizes for cache implementations. For a complete description of this 
instruction, refer to Chapter 10, “Instruction Set.” 


4.8.2 Data Cache Block Touch Instruction (dcbt) 


This instruction provides a method for improving performance through the use of software- 
initiated prefetch hints. The MPC601 performs the fetch for the cases when the address hits 
in the UTLB or the BTLB, and when it is permitted load access from the addressed page. 
The operation is treated similarly to a byte load operation with respect to coherency. 


If the address translation does not hit in the UTLB or BTLB, or if it does not have load 
access permission, the instruction is treated as a no-op. 


If the access is directed to a cache-inhibited page, or to an I/O controller interface segment, 


then the bus operation occurs, but the cache is not updated. 


This instruction never affects the reference or change bits in the hashed page table. 


While the MC98601 maintains a cache line size of 64 bytes, the debt instruction may only 
result in the prefetch of a 32-byte sector (the one directly addressed by the EA). The other 
32-byte sector in the cache line may or may not be fetched, depending on activity in the 
dynamic memory queue. 


A successful debt instruction will affect the state of the UTLB and cache LRU bits as 
defined by the LRU algorithm. 
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Note that other PowerPC implementations may not take any action based on the execution 
of this instruction, but they may prefetch the cache block corresponding to the EA into their 
cache. 


4.8.3 Data Cache Block Touch for Store Instruction (dcbtst) 


The dcbtst instruction behaves exactly like the dcbt instruction as implemented on the 
MPC601. 


4.8.4 Data Cache Block Set to Zero Instruction (dcbz) 


If the block (the cache sector consisting of 32 bytes) containing the byte addressed by the 
EA is in the data cache, all bytes are cleared to 0. 


If the block containing the byte addressed by the EA is not in the data cache and the 
corresponding page is caching-allowed, the block is established in the data cache without 
fetching the block from main memory, and all bytes of the block are cleared to 0). 


If the page containing the byte addressed by the EA is caching-inhibited or write-through, 
then the system alignment exception handler is invoked. 


If the block containing the byte addressed by the EA is in coherence required mode, and 
the block exists in the data cache(s) of any other processor(s), it is kept coherent in those 
caches. 


The dcbz instruction is treated as a store to the addressed byte with respect to address 
translation and protection. 


If the EA corresponds to an I/O controller interface segment (SR[T]=1), the dcbz 
instruction is treated as a no-op. 


See Chapter 5, “Exceptions,” for more information about a possible delayed machine check 
exception interrupt that can occur by use of dcbz if the operating system has set up an 
incorrect memory mapping. 


4.8.5 Data Cache Block Store Instruction (dcbst) 


If the block (the cache sector consisting of 32 bytes) containing the byte addressed by the 
BA is in coherence required mode, and a block containing the byte addressed by the EA is 
in the data cache of any processor and has been modified, the writing of it to main memory 
is initiated. 


The function of this instruction is independent of the wriie-ihrough and cache- 
inhibited/allowed modes of the block containing the byte addressed by the EA. 


This instruction is treated as a load from the addressed byte with respect to address 
translation and protection. 
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If the EA specifies a storage address for an I/O controller interface segment (segment 
register T-bit=1), the debst instruction is treated as a no-op. 


4.8.6 Data Cache Block Flush Instruction (dcbf) 


The action taken depends on the memory mode associated with the target, and on the state 
of the sector. The list below describes the action taken for the various cases. The actions 
described must be executed regardless of whether the page containing the addressed byte 
is in caching-inhibited or caching-allowed mode. 


¢ Coherence-required mode 


Unmodified sector—Invalidates copies of the sector in the caches of all processors. 


Moditied sector—Copies the sector to memory. Invalidates copies of the sector in 
the caches of all processors. 


Absent sector—If modified copies of the sector are in the caches of other processors, 
causes them to be copied to memory and invalidated. If unmodified copies are in the 
caches of other processors, cause those copies to be invalidated. 


¢ Coherence-not-required mode 
Unmodified sector—Invalidates the sector in the processor's cache. 
Modified sector—Copies the sector to memory. Invalidate the sector in the 
processor's cache. 
Absent sector—Does nothing. 


The MPC601 treats this instruction as a load from the addressed byte with respect to 
address translation and protection. 


4.8.7 Enforce In-Order Execution of I/O Instruction (eieio) 


The eieio instruction provides an ordering function for the effects of load and ‘store 
instructions executed by a given processor. Executing eieio ensures that all memory 
accesses previously initiated by the given processor are completed with respect to main 
memory before any memory accesses subsequently initiated by the processor access main 
memory. 


The eieio instruction orders loads and stores to caching-inhibited memory only. 


The eieio instruction is intended for use only in doing memory-mapped I/O. It can be 
thought of as placing a barrier into the stream of memory accesses issued by a processor, 
such that any given memory access appears to be on the same side of the barrier to both the 
processor and the I/O device. 


The eieio instruction may complete before previously initiated memory accesses have been 
performed with respect to other processors and mechanisms. 
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Unlike the sync instruction, eieio need not serialize the processor. It requires only that the 
processor execute memory accesses in the order described above, and enforce that order in 
any queues in the memory subsystem. 


4.8.8 Instruction Cache Block Invalidate Instruction (icbi) 


The icbi instruction is provided in the PowerPC architecture for use in processors with 
separate instruction and data caches. The effective address is computed, translated, and 
checked for protection violations as defined in the PowerPC architecture; however, the 
instruction functions as a no-op on the MPC601. 


The Data Cache Block Invalidate (debi) instruction may be used to invalidate instructions 
from the cache in the MPC60)1. Refer also to the following section that describes the 
requirements for self-modifying code. 


In other PowerPC processors, the icbi instruction executes as follows: 


¢ If the block (sector) containing the byte addressed by EA is in coherency-required 
mode and a sector containing the byte addressed by EA is in the instruction cache of 
any processor, the sector is made invalid in all such processors, so that subsequent 
references cause the sector to be refetched. 


¢ If coherency is not required for the sector containing the byte addressed by EA and 
a sector containing the byte addressed by EA is in the instruction cache of this 
processor, the sector is made invalid in this processor so that subsequent references 
cause the sector to be fetched from main memory (or from a cache). 


4.8.9 Instruction Synchronize Instruction (isync) 


The isyne instruction waits for all previous instructions to complete and then discards any 
prefetched instructions, causing subsequent instructions to be fetched (or refetched) from 
memory and to execute in the context established by the previous instructions. This 
instruction has no effect on other processors or on their caches. 


4.9 Bus Operations Caused by Cache Control 
Instructions 


Table 4-6 provides an overview of the bus operations initiated by cache control 
instructions. Note that Table 4-6 assumes that the WIM bits are set to 001; that is, since the 
cache is operating in write-back mode, caching is permitted and coherency is enforced. 
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Table 4-6.Bus Operations Caused by Cache Control Instructions (WIM = 001) 


er 
ee fe 
Ce CC 
ise [wean fom 
Ce 
CC Coe 
Cn Ce 
Cs 
sew 










Operation 










sync/eieio 






|None Writes over modified data 
No change Read State change on reload 
may cast out sector 
[tects [Ne 


Table 4-6 does not include noncacheable or write-through cases, nor does it completely 
describe the mechanisms for the operations described. These conditions are described in 
Section 4.11, “MESI State Transactions.” 





The cache control instructions are described in detail in Chapter 3, “Addressing Modes and 
Instruction Set Summary,” and Chapter 10. Several of the cache control instructions 
broadcast onto the MPC6()1 interface so that all processors in a multiprocessor system can 
take appropriate actions. The MPC601 contains snooping logic to monitor the bus for these 
commands and the control logic required to keep the cache and the memory queues 
coherent. Additional details on the specific bus operations performed by the MPC601can 
be found the Chapter 9, “System Interface Operation.” 


4.10 Memory Unit 


The MPC601’s memory unit contains read and write queues that buffer operations between 
the external interface and the cache. These operations are comprised of operations resulting 
from load and store instructions that are cache misses, read and write operations required 
to maintain cache coherency, and table search operations. As shown in Figure 4-5, the read 
queue contains two elements and the write queue contains three elements. Each element of 
the write queue can contain as many as eight words (one sector) of data. One element of the 
write queue, marked snoop in Figure 4-5, is dedicated to writing cache sectors to system 
memory after a modified sector is hit by a snoop from another processor or snooping device 
on the system bus. The use of this queue guarantees a high-priority operation receives a 
deterministic response time when snooping hits a modified sector. 
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DATA 
(from cache) 


ADDRESS 
(from cache) 


WRITE QUEUE 


ADDRESS 


Figure 4-5. Memory Unit 


The other two elements in the write queue are used for store operations and writing back 
modified sectors that have been deallocated by updating the queue; that is, when a cache 
sector is full, the least-recently used cache sector is deallocated by first being copied into 
the write queue and from there to system memory if it is modified. Note that snooping can 
occur after a sector has been pushed out into the write queue and before the data has been 
written to system memory. Therefore, to maintain a coherent memory, the write queue 
elements are compared to snooped addresses in the same way as the cache tags. If a snoop 
hits a write queue element, the data is first stored in system memory before it can be loaded 
into the cache of the snooping bus master. Full coherency checking between the cache and 
the write queue prevents dependency conflicts. 


The retry signals and bus operations pertaining to snooping are described in Chapter 9, 
“System Interface Operation.” 


Execution of a load or store instruction is considered complete when the associated address 
translation completes, guaranteeing that the instruction has completed to the point where it 
is known that it will not generate an internal exception. However, after address translation 
is complete, a read or write operation can generate an external exception. 


Load and store instructions are always issued and translated in program order with respect 
to other load and store instructions. However, a load or store operation that hits in the cache 
can complete ahead of those that miss in the cache. The MPC601 ensures memory 
consistency by comparing target addresses and prohibiting instructions from completing 
out of order if an address matches. Load and store operations can be forced to execute in 
strict program order. 
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4.10.1 Memory Unit Queuing Structure 


The memory queue receives requests from the cache unit for arbitration onto the MPC601 
bus interface. These requests may either be presented immediately to the bus interface logic 
or they may be queued for future arbitration onto the bus. The memory queue consists of a 
two-element load queue and a three-element write queue. Each write queue element can 
hold a sector of data (32 bytes) associated with a single address. 


Some operations presented to the memory queue cannot be queued. These operations 
typically require synchronization with respect to either the execution units, the cache, or 
the memory queue itself. In general, when these requests are presented and not arbitrated 
directly onto the bus, they stall above the cache (but do not necessarily prevent use of the 
cache) and attempt to re-arbitrate on the next cycle. These operations include the following: 

° Cache control instructions that are broadcast 

¢ Execution of the tlbie instruction 

* Execution of the sync instruction 

- Execution of the eieio instruction 

¢ Accesses to I/O controller interface segments 


¢ Cache requests for exclusive ownership when the sector is resident but not exclusive 
in the cache 


The memory queues also allows the optional loading of the sector adjacent to the one 
containing the critical data. As the memory read queue receives and processes cache sector 
reload requests, it is advantageous to fetch the other sector if it is not already in the cache 
unless fetching the other sector delays access to data required for the machine to continue 
processing. The memory unit logic detects whether other operations are pending; if not, it 
initiates a fetch for the other sector. Note that this function can be disabled by setting bit 26 
in HIDO (for instruction fetch misses) and bit 27 in HIDO (for load/store misses). 


4.10.2 Memory Unit Queuing Priorities 

This section describes the priorities for access to the system interface: 
1. High-priority cache push-out operations 
2. Normal snoop push-out operations 


3. I/O controller interface segment accesses that incur no additional delays (that is, 
they have not been retried because of latency). 


4. Cache instruction operations 

5. Read requests, such as loads, RWITMs, and instruction fetches 
6. Single-beat write operations 

7. sync instructions 

8. Optional cache-line fill operations 
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9. Cache sector cast-out operations 


10.1/O controller interface segment accesses that incur additional delays (that is, they 
have been retried because of latency) 


4.10.3 Bus Interface 


The bus interface logic sequences operations onto the MPC601 bus according to defined 
protocols. The bus interface logic is also responsible for snooping other bus traffic, 
presenting these operations to the rest of the device for coherency considerations and 
reporting the appropriate snoop status onto the bus. 


For additional information about the MPC601 bus interface and the bus protocols, refer to 
Chapter 9, “System Interface Operation.” 


4.11 MESI State Transactions 


Table 4-7 shows MESI state transitions for various operations. 


Table 4-7. MESI State Transitions 


Cache Current Next Bus 


Load or Fetch | Read Same | 1 Cast out of ee eee with kill 
(T = 0) sector 1 ee required) 
2 Pass four-beat read iil 
to memory queue 
3 Secondary cast out Write with kill 
of sector 2 (as 
required) 
Loador Fetch | Read S,E,M Same | Read data from cache 
T = 0) 
Load or Fetch | Read x1x Same | Pass single-beat read Read 
T=0 or Load to memory queue 
(T=1, 
BUID=x'7F’) 
al mR — - 
Load 


T=0 or Load 
(T=1, 
Read x1x CRTRY read (push Write with kill 
sector to write queue) 
vO /O load 
(T=1, controller 
BUID#x’7F’) load 


BUID=x'7F’) 
flax =| Read | Read | Acts like other reads but bus operation uses special encoding 






























Load or Fetch 
T=0 or Load 
(T=1, 
BUID=x'7F’) 
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Table 4-7. MESI State Transitions 


Current Next Bus 


Same | 1 Cast out of modified Write with kill 
sector 
2 Pass RWITM to 
memory queue 
3 Secondary cast out Write with kill 
of sector 2 
ae: 1 CRTRY write 


3 Write data to cache 


tae Write data to cache 


Pass single-beat write Write with 
to memory queue flush 


Store # stcx Write 
(T=0) 


Store # stcx Write 1 | 1 White datato cache | data to cache 
=0 
oe 2 Pass _wmesescse | Write with 
write to memory flush 
queue 


Store # stcex Write 1 CRTRY write Sa 
=0 

— 2 Push sector to write Write with kill 
queue 

Store (T=0) Write Pass single-beat write Write with 

or stcx to memory queue flush 

(WIM=1 Ox) 

or store (T=1, 

BUID=x'7F’) 

Store (T=0) Write CRTRY write 

or stcx 

(WIM=10x) 

or store (T=1, 

BUID=x'7F’) 

Store (T=0) | Write 1 CRTRY write f— 

or stcx 

(WIM=10x) 2 Push sector to write Write with kill 

or store (T=1, queue 

BUID=x'7F’) 


Store (T=1, vO /O store 
BUID#x'7F’) controller request 


Conditional | If the reserved bit is set, this operation is like other writes except the bus operation 
write uses a special encoding. 
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Table 4-7. MESI State Transitions 


Cache Current Next Bus 
TLB 1 CRTRY TLBI TLB invalidate 
invalidate 

Ee 


sync/eieio Synchroniz | Yes XXX X 1 CRTRY sync 
ation 

ee 

Er ee 


Datacache | Yes | xxx | 1S.E Same | 1 CRTRY debf feeen 
block flush 









| Same |same 1 | 3 State change only 


Datacache Push sector to write eae! with kill 
block flush queue 
Yes 


Datacache Same | 1 CRTRY debst ee 
block store 


















es de 


Datacache Push sector to write Write with kill 
block store queue 

Datacache | No Alignment trap 
block set to 

zero 

Data cache 10x Xx Alignment trap 
block set to : 
zero 





4 Secondary cast out Write with kill 
of sector 2 


| Same | | same [mM | 5 Clear sector 


Datacache | Yes 1 CRTRY debz So 
block set to 
zero 2 Pass }2 Passkil = 


| Same the 3 Clear sector 


ae a _ _ 


Datacache | Yes Same | 1 CRTRY dcbz St 

block set to = Ee 

zero 2 Cast out of modified | Write with kill 
sector 











Datacache 
block set to 
zero 
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Secondary 
cast out 
Single-beat 
read 


Four-beat 
read (quad- 
word 1) 


Four-beat 
read (quad- 
word 2)—S 


Four-beat 
read (quad- 
word 2)—E 


Four-beat 
write (quad- 
word 1) 


Four-beat 
write (quad- 
word 2) 


Optional 
reload of 
adjacent 
sector (quad- 
word 1) 
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Table 4-7. MESI State Transitions 
Cache Bus 
Operation | sync 
Datacache | No 
block touch 





|e 
= 





Datacache 
block touch 


Datacache 
block touch 


— 1 CRTRY read 










Bus 
L286 | coezctoe | option 
Same | Pass single-beat read Read 
to memory queue 


2 Push sector to write eee with kill 
queue 







Datacache Same | 1 Cast out of modified | Write with kill 
block touch sector (as required) 
2 Cora four-beat read Read 
to Cora queue 
3 Secondary cast out Write with kill 
of sector (as 
required) 
or ieee red 
Secondary Cast out Write with kill 
cast out 


Reload No XXX 
dump 1 


Reload No XXX 
dump 1 


Forward data_in 


Same | 1 Forward data_in 
2 Write data_in to 
cache 
al Write data_in to cache 


Write data_in to cache 
Same | 1 Splice and forward 
data_in 

2 Write data_in to 
cache 
Reload No XXX Write data_in to cache 
dump 2 


Reload XXX Same | Write data_in to cache 
dump 1 
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Table 4-7. MES! State Transitions 


Operation Cache Bus Current | Next Cache Aciions Bus 
P Operation | sync State State Operation 


Optional Reload Ss Write data_in to cache 
reload of dump 2 

adjacent 

sector (quad- 

word 2)—S 

Optional Reload No XXX Write data_in to cache 
reload of dump 2 

adjacent 

sector (quad- 

word 2)—E 


ES Snoop cS) State change only 
(committed) 
s—| Snoop XXX iS) State change only 
(committed) 
E-l| Snoop XXX State change only 
(committed) 
M-—>| Snoop No State change only 
(committed) 
Push Snoop XXX cS) Conditionally push Write with kill 
M—-sS 
Push Snoop XXX Conditionally push Write with kill 
M—| 
Push Snoop XXX Conditionally push Write with kill 
M—-E 


Note that debt is presented to the cache as a load operation. 
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Chapter 5 
Exceptions 


The PowerPC exception mechanism allows the processor to change to supervisor state as 
a result of external signals, errors, or unusual conditions arising in the execution of 
instructions. When exceptions occur, information, such as the instruction that should be 
executed after control is returned to the original program and the contents of the machine 
state register, is saved to the save/restore registers (SRRO and SRR1), program control 
passes from user to supervisor level, and software continues execution at an address 
(exception vector) predetermined for each exception. 


Although multiple exception conditions can map to a single exception vector, the specific 
condition can be determined by examining a register associated with the exception—for 
example, the DAE/source instruction service register (DSISR) and the floating-point status 
and control register (FPSCR). Additionally, specific exception conditions can be explicitly 
enabled or disabled by software. 


Except for the catastrophic asynchronous exceptions (machine check and system reset) the 
MPC601 exception model is precise, defined as follows: 


¢ The exception handler is given the address of the excepting instruction (or the next 
instruction to execute in the case of asynchronous, precise exceptions) 


¢ All instructions prior in the instruction stream to the excepting instruction have 
completed execution and have written back their results. 


¢ No instructions subsequent to the excepting instruction in the instruction stream 
have been issued. 


Although the PowerPC architecture supports out-of-order instruction dispatch, exceptions 
are handled in program order; therefore, while exception conditions may be recognized out 
of order, they are handled strictly in order. When an instruction-caused exception is 
recognized, any unexecuted instructions that appear earlier in the instruction stream, 
including any that have not yet entered execute state, are allowed to complete. Any 
exceptions, caused by those instructions are handled in order. Likewise, exceptions that are 
asynchronous and precise are recognized when they occur, but are not handled until all 
instructions currently in execute stage successfully complete execution and report their 
results. 
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Unless a catastrophic condition causes a system reset or machine check exception, only one 
exception is handled at a time. If, for example, a single instruction encounters multiple 
exception conditions, those conditions are encountered sequentially. After the exception 
handler handles an exception, the instruction execution continues until the next exception 
condition is encountered. This method of recognizing and handling exception conditions 
sequentially guarantees that exceptions are recoverable. 


Exception handlers should save the information saved in SRRO and SRR1I early to prevent 
the program state from being lost due to a system reset and machine check exception or to 
an instruction-caused exception in the exception handler. 


This chapter describes the MPC601 exception model, it explains each class of instruction, 
and it describes how the program state is saved for individual exceptions. 


5.1 Exception Classes 


All MPC601 exceptions can be described as either precise or imprecise and either 
synchronous or asynchronous. Asynchronous exceptions are caused by events external to 
the processor’s execution; synchronous exceptions, which are all handled precisely by the 
MPC601, are caused by instructions. 


The MPC601 exceptions are shown in Table 5-1. 


Table 5-1. MPC601 Exception Classifications 
Synchronous/Asynchronous Exception Type 
Asynchronous Imprecise Machine Check 

System Reset 


Asynchronous Precise External interrupt 
Decrementer 
Synchronous Precise Instruction-caused exceptions 


Although exceptions have other characteristics as well, such as whether they are maskable 
or nonmaskable, the distinctions shown in Table 5-1 define categories of exceptions that the 
MPC601 handles uniquely. Note that Table 5-1 includes no synchronous imprecise 
instructions. While the PowerPC architecture supports imprecise floating-point exceptions, 
they do not occur in the MPC601. 


Exceptions, and conditions that cause them, are listed in Table 5-2. 
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Table 5-2. Exceptions and Conditions 


Exception Vector Offset é 3 


System reset 00100 Asystem reset is caused by the assertion of either SRESET or HRESET. 
Machine check | 00200 A machine check is caused by the assertion of the TEA signal. 


The cause of a data access exception can be determined by the bit settings in 

the DSISR, listed as follows: 

1 Set if the translation of an attempted access is not found in the primary 
hash table entry group (HTEG), or in the rehashed secondary HTEG, or in 
the range of a BAT register; otherwise cleared. 

4  Setif a memory access is not permitted by the page or BAT protection 
mechanism described in Chapter 6; otherwise cleared. 

5 Setif the access was to an I/O segment (SR[T] =1) by an lwarx, stwex., or 
Iscbx instruction; otherwise cleared. 

6 Set for a store operation and cleared for a load operation. 

9 Set if an EA matches the address in the DABR while in one of the three 

compare modes. 

Set if eciwx or ecowx is used and EAR[E] is cleared. 




















































Instruction 
access 


An instruction access exception is caused when an instruction fetch cannot be 

performed for any of the following reasons: 

* The effective address cannot be translated. That is, there is a page fault for 
this portion of the translation, so an instruction access exception must be 
taken to retrieve the translation from a storage device such as a hard disk 
drive. 

* The fetch access is to an I/O segment. 

* The fetch access violates memory protection. If the K bits in the segment 

register and the PP bits in the PTE are set to prohibit read access, 

instructions cannot be fetched from this location. 


An alignment exception is caused when the MPC601 cannot perform a memory 

access for one of the following reasons: 

* The operand of a floating-point load or store operation is in an /O segment 
(SR[T]=1). 

* An Iscbx instruction crosses a page boundary. 

* The operand of a load or store (including string loads and stores) crosses a 
protection boundary. 

¢ The operand of a Imw or stmw instruction crosses a segment or BAT 
boundary. 

* The operand of a Data Cache Block Set to Zero (dcbz) instruction is in a 
page specified as write-through or cache-inhibited for a page-address 
translation access. 

¢ In little-endian mode, any operand that is not properly aligned 

* in litle-endian mede, any attemoted execution of the string/multiple 

instructions 
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Table 5-2. Exceptions and Conditions (Continued) 


Vector Offset . - 
Causing Conditions 


00700 Aprogram exception is caused by one of the following exception conditions, 

which correspond to bit settings in SRR1 and arise during execution of an 

instruction: 

¢ Floating-point enabled exception—A floating-point enabled exception 
condition is generated when the following condition is met: 

(MSR[FEO] | MSR[FE1]) & FRSCR[FEX] is 1. 

FPSCR[FEX] is set by the execution of a floating-point instruction that 
causes an enabled exception or by the execution of a “move to FPSCR" 
instruction that results in both an exception condition bit and its 
corresponding enable bit being set in the FPSCR. 

¢ Illegal instruction—An illegal instruction program exception is generated 
when execution of an instruction is attempted with an illegal opcode or illegal 
combination of opcode and extended opcode fields, or when execution of an 
optional instruction not provided in the MPC601 is attempted (these do not 
include those optional instructions that are treated as no-ops). 

¢ Privileged instruction—A privileged instruction type program exception is 
generated when the execution of a privileged instruction is attempted and the 
MSR register user privilege bit, MSR[PR], is set. In the MPC601, this 
exception is generated for mtspr or mfspr with an invalid SPR field if 
SPR[0]=1 and MSR[PR]=1. This may not be true for all PowerPC 
processors, 

¢ Trap—aA trap type program exception is generated when any of the 
conditions specified in a trap instruction is met. 

* Illegal operations—The MPC601 takes illegal operation program exceptions 

for unimplemented PowerPC instructions. The PowerPC instruction set is 

described in Chapter 3. 


Exception 
Type 


Program 






















































Floating-point 
unavailable 


Decrementer 


VO controller 
interface error 


A floating-point unavailable exception is caused by an attempt to execute a 
floating-point instruction (including floating-point load, store, and move 
instructions) when the floating-point available bit is disabled, MSR[FP]=0. 


00900 The decrementer exception occurs when the most significant bit of the 
decrementer (DEC) register transitions from 0 to 1. 


00A00 An V/O controller interface error exception is taken only when an operation to an 
/O segment fails (such a failure is indicated to the MPC601 by a particular bus 
reply packet). If an I/O controller interface exception is taken on a memory 
access directed to an I/O segment, the SRRO contains the address of the 
instruction following the offending instruction. Note that this exception may not 
be implemented in other PowerPC processors. 





Reserved 0OBO00 


System call 00c00 Asystem call exception occurs when a System Call (sc) instruction is executed. 


Reserved 00E00 Other PowerPC processors may use this vector for floating-point assist 


exceptions. 


Reserved 00E10-00FFF 


Reserved 01000-01 FFF 


Reserved, implementation-specific 
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Table 5-2. Exceptions and Conditions (Continued) 


Exception Vector Offset . 7 


Run mode 02000 The run mode exception is taken depending on the settings of the HID1 register 

exception and the MSR[SE] bit. 

The following modes correspond with bit settings in the HID1 register: 

¢ Normal run mode—no address break points are specified, and the MPC601 
executes from to zero to three instructions per cycle 

¢ Single instruction step mode—One instruction is processed at a time. The 
appropriate break action is taken after an instruction is executed and the 
processor quiesces. 

¢ Limited instruction address compare—The MPC601 runs at full speed (in 
parallel) until the EA of the instruction being decoded matches the EA 
contained in HID2. Addresses for branch instructions and floating-point 
instructions may never be detected. 

The following mode is taken when the MSR[SE] bit is set. 

« MSR{[SE] trace mode—Note that in other PowerPC implementations, the 

trace exception is a separate exception with its own vector x’00D00". 


5.1.1 Precise Exceptions 


In the MPC601, all synchronous exceptions and the asynchronous external interrupt and 
decrementer exceptions are handled precisely; that is, all instructions that occur in the 
instruction stream before the excepting event appear to complete and subsequent 
instructions execute after the exception has been handled. When one of the MPC601’s 
precise exceptions occurs, SRRO is set to point to an instruction such that all prior 
instructions in the instruction stream have completed execution and no subsequent 
instruction has begun execution. However, depending on the exception type, the instruction 
addressed by SRRO may not have completed execution. 























When an exception occurs, instruction dispatch (the issuance of instructions by the 
instruction fetch mechanism to any instruction execution mechanism) is halted and the 
following synchronization is performed: 


1. The exception mechanism waits for all previous instructions in the instruction 
stream to complete to a point where they report all exceptions they will cause. 


2. The processor ensures that all previous instructions in the instruction stream 
complete in the context in which they began execution. 


3. The exception mechanism is responsible for saving and restoring the processor state. 
After control passes back to the user level, there are no instructions in execute stage, 
and the user program instructions are dispatched and executed in this new context. 


The synchronization described above is sometimes referred to as context synchronization. 
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5.1.1.1 Synchronous/Precise Exceptions 


In the MPC601, all exceptions caused by instructions are precise. When instruction 
execution causes a precise exception, the following conditions exist at the exception point: 


¢ Depending on the type of exception, SRRO addresses either the instruction causing 
the exception or the immediately following instruction. The instruction addressed 
can be determined from the exception type and status bits, which are described with 
the description of each exception. 


¢ All instructions that precede the excepting instruction are allowed to complete 
before the exception is processed. However, some memory accesses generated by 
these preceding instructions may not have been performed with respect to all other 
processors or system devices. 


¢ The instruction causing the exception may not have begun execution, may have 
partially completed, or may have completed, depending on the exception type. 


¢ No subsequent instructions in the instruction stream complete execution. 


Note that other PowerPC microprocessors may support optional imprecise floating-point 
exception modes. While parallel processing allows the possibility of two instructions 
reporting exceptions during the same cycle, they are handled in program order. If a single 
instruction generates multiple exception conditions, those exceptions are handled 
sequentially, as described in Section 5.1.3, “Sequential Exception Processing.” Exception 
priorities are described in Section 5.1.2, “Exception Priorities.” 


5.1.1.2 Asynchronous/Precise Exceptions 


The MPC601 supports two asynchronous, precise exceptions—external interrupt and 
decrementer exceptions. For asynchronous exceptions, the following conditions exist at the 
exception point: 


¢ All instructions issued before the event that caused the exception, and any 
undispatched instructions that precede those instructions in the instruction stream, 
appear to have completed before the exception is processed. However, some 
memory accesses generated by these preceding instructions may not have been 
performed with respect to all other processors or system devices. 


¢ SRRO addresses the next instruction that would have been executed next had the 
exception not occurred. 


¢ Architecturally, no subsequent instructions in the instruction stream complete 
execution. 


These two exceptions are maskable. When the machine state register external interrupt 
enable bits are cleared (MSR[EE]=0), these exception conditions are latched and are not 
recognized until the EE bit is set. MSR[EE] is cleared automatically when an exception is 
taken to delay recognition of conditions causing asynchronous, precise exceptions. No two 
precise exceptions can be recognized simultaneously. Handling of an asynchronous, 
precise exception does not begin until all currently executing instructions complete and any 
synchronous, precise exceptions caused by those instructions have been handled, as 
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described in Section 5.1.3, “Sequential Exception Processing.” Exception priorities are 
described in Section 5.1.2, “Exception Priorities.” 


5.1.1.3 Asynchronous, Imprecise Exceptions 

There are two asynchronous, imprecise exceptions—system reset and machine check. 
These two exceptions have the highest priority and can occur while other exceptions are 
being processed. Note that asynchronous, imprecise exceptions are never delayed; 
therefore, if two of these exceptions occur in immediate succession, the state information 
saved by the first exception may be overwritten when the subsequent exception occurs. 


These exceptions cannot by masked by using the MSR[EE] bit. A machine check exception 
can only occur if the machine check enable bit, MSR[ME], is set. If MSR[ME] is cleared, 
the processor goes directly into checkstop state. When an imprecise exception occurs, the 
following conditions exist at the exception point: 


¢ The integer instruction pipeline acts as the point of reference for all instructions in 
the pipeline. When an asynchronous, imprecise exception occurs, floating-point 
instructions that have begun execution out-of-order ahead of integer instructions 
that have yet to be decoded do not complete execution. 


¢ SRRO addresses either the instruction causing the exception or some instruction 
following the instruction causing the exception. 


¢ Anexception is generated such that all instructions preceding the instruction 
addressed by SRRO appear to have completed with respect to the executing 
processor. 


Neither the instruction addressed by SRRO nor any subsequent instructions have begun 
execution. 


5.1.2 Exception Priorities 


This section describes how exceptions are prioritized. Exceptions are roughly prioritized 
by exception class, as follows: 


1. Asynchronous, imprecise exceptions have priority over all other exceptions. These 
exceptions are taken forcibly, and do not wait for the completion of any precise 
exception handling. 


2. Synchronous, precise exceptions are caused by instructions and are handled in strict 
program order. 


3. Asynchronous, precise exceptions (external interrupt and decrementer exceptions) 
are delayed until higher priority exceptions are handled. 


The exceptions are listed in Table 5-3 in order of highest to lowest priority. 
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Table 5-3. Exception Priorities 


vclass jriowy | eens | 
class 


Asynchronous, System reset—The system reset exception has the highest priority of all exceptions. 

imprecise If this exception exists, the exception mechanism ignores all other exceptions and 
generates a system reset exception. Instructions issued before the generation of a 
system reset exception cannot generate a nonmaskable exception. 


Machine check—The machine check exception is the second-highest priority 
exception. If this exception occurs, the exception mechanism ignores all other 
exceptions (except reset) and generates a machine check exception. Instructions 
issued before the generation of a machine check exception cannot generate a 
nonmaskable exception. 


Synchronous, 3 Instruction dependent— When an instruction causes an exception, the exception 
precise mechanism waits for any instructions prior to the exception instruction in the 
instruction stream to execute. Any exceptions caused by these instructions are 


handled first. It then generates the appropriate exception if no higher priority 
exception exists when the exception is to be generated. 

Note that a single instruction can cause multiple exceptions. The ordering of such 
exceptions is described in 5.1.3, “Sequential Exception Processing.” 


Asynchronous, 
precise 


External interrupt—The external interrupt exception mechanism waits for instructions 
currently dispatched to complete execution. After all dispatched instructions are 
executed, and any exceptions caused by those instructions are handled, the 
exception mechanism generates this exception if no higher priority exception exists. 
This exception is delayed if MSR[EE] is cleared. 


Decrementer—This exception is the lowest priority exception. When this exception is 
created, the exception mechanism waits for all other possible exceptions to be 
reported. It then generates this exception if no higher priority exception exists. This 
exception is delayed if MSR[EE] is cleared. 





5.1.3 Sequential Exception Processing 


Although more than one condition that can cause a precise exception can exist 
simultaneously, precise exceptions are handled sequentially in the MPC601. The order in 
which exceptions are recognized is determined by program order and whether the 
exception is synchronous or asynchronous, precise or imprecise, and masked or 
nonmasked. 


Synchronous, precise exceptions (that is, exceptions that are caused by instructions) are 
handled in strict program order, even though instructions can execute and exceptions can 
be detected out of order. Therefore, before the MPC601 processes an instruction-caused 
exception, it executes all instructions, and handles any resulting exceptions, that appear 
earlier in the instruction stream. 


A single instruction may generate multiple exception conditions. Of these exceptions, the 
MPC601 handles the exception it encounters first, then the execution of the excepting 
instruction continues until the next excepting condition is encountered. In the POWER 
architecture, this feature is referred to as ordered exceptions. 
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If the exception is asynchronous and precise (namely an external interrupt or decrementer 
exception), the MPC601 synchronizes the pipeline by completing the execution of any 
instruction in the execute stage and any undispatched instructions that appear earlier in the 
instruction stream (including any exceptions they generate) before handling the external 
interrupt or decrementer exceptions. 


5.1.3.1 Recognition of Asynchronous, imprecise Exceptions 
Exceptions that are nonmasked, imprecise, and asynchronous (namely system reset or 
machine check exceptions) may occur at any time. That is, these exceptions are not delayed 
if another exception is being handled. As a result, state information for the interrupted 
exception may be lost; therefore, these exceptions are typically non-recoverable. 


All other precise exceptions have lower priority than system reset and machine check 
exceptions, and they can be delayed. 


5.1.3.2 Recognition of Precise Exceptions 

Only one precise exception can be reported at a time. (Note that PowerPC implementations 
that support imprecise-mode floating-point enabled exceptions allow those to be handled 
in the same manner as described in this section.) 


Figure 5-1 illustrates the ordering of precise exceptions. Note that this ordering is on a per- 
instruction basis. If a precise, asynchronous exception condition occurs while instruction- 
caused exceptions are being processed, its handling is delayed until all instruction-caused 
exceptions are handled and the instruction completes execution. 


5.2 Exception Processing 


When an exception is taken, the MPC601 uses the save/restore registers, SRRO and SRR1, 
to save the contents of the machine state register for user-level mode and to identify where 
instruction execution should resume after the exception is handled. The save/restore 
register 0 (SRRO) is a 32-bit register that the MPC601 uses to save either the address of the 
instruction that causes the exception, the one that follows, or the next instruction that would 
have executed in the case of an asynchronous, imprecise exception. This address is used 
when an rfi instruction is executed. The SRRO is shown in Figure 5-2. 





MOTORCLA Chapter 5. Exceptions 5-9 


Instruction Access 


Program Exception 
(Illegal/Privileged Instruction) 


rfi? or mtmsr Integer Floatirig Point! 


Privileged Instruction FP Unavailable 






















FP Alignment 






System Call Alignment 


VO Cont UF 
Error 








Precise Mode FP Enabled? 
















Run Mode 





including Trace”) 


( 


External Interrupt 





Decrementer 


‘Not all floating-point instructions can cause enabled exceptions. 

2if the MSR bits FEO and FE1 are set such that precise mode floating-point enabled exceptions are 
enabled and the FPSCR[FEX] bit is set, a program exception will result. 

3Generating a trace exception after an rfican cause unpredictable results. 


Figure 5-1. Recognition of Precise Exception Conditions 








SRRO (holds EA for resuming program execution) 


Figure 5-2. Machine Status Save/Restore Register 0 


When an exception occurs, SRRO is set to point to an instruction such that all prior 
instructions have completed execution and no subsequent instruction has begun execution. 
The instruction addressed by SRRO may not have completed execution, depending on the 
exception type. SRRO addresses either the instruction causing the exception or the 
immediately following instruction. The instruction addressed can be determined from the 
exception type and status bits. 
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The SRR1 is a 32-bit register used to save machine status on exceptions and to restore 
machine status when rfi or sc is executed. The SRR1 is shown in Figure 5-3. 


exception-specific _ MSR[16-31] 


0 15 16 31 
Figure 5-3. Machine Status Save/Restore Register 1 


In general, when an exception occurs, bits 0-15 of SRR1 are loaded with exception-specific 
information and bits 16-31 of the machine state register (MSR) are placed into bits 16-31 
of SRR1. The machine state register is shown in Figure 5-4 


Reserved 


Teelprlrr|vdredse Peeler «oes 


0 15 16171819 20 2122 23 24 25 26 27 28 2930 31 





Figure 5-4. Machine State Register 
Table 5-4 shows the bit definitions for the MSR. 


Table 5-4. Machine State Register Bit Settings 


External interrupt enable ; 

0 ‘The processor delays recognition of external interrupts and decrementer exception 
conditions. 

1 ‘The processor is enabled to take an external interrupt or the decrementer exception. 



















Privilege level 
0 The processor can execute both user and supervisor privilege-level instructions. 
1 The processor can only execute user-level instructions. 












Floating-point available 

0 The processor prevents dispatch of floating-point instructions, including floating-point 
loads, stores and moves. Floating-point enabled program exceptions can still occur and 
the FPRs can still be accessed. 

1 The processor can execute floating-point instructions, and can take floating-point 

enabled exception type program exceptions. 







Machine check enable ; 
0 Machina check excentions are disabled, 


1. Machine check exceptions are enabled. 


}20 | FEO | Floating-point exception mode 0 (See Table 5-5.) 
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Table 5-4. Machine State Register Bit Settings (Continued) 


se 


Single-step trace enable 
The processor executes instructions normally. 
The processor generates a single-step trace exception upon the successful execution of 
the next instruction. When this bit is set, the processor dispatches instructions in strict 
program order. Successful execution means the instruction caused no other exception. 
Single-step tracing may not be present on all implementations. If the function is not 
implemented, MSR[SE] should be treated as a reserved MSR bit: mfmsr may return the 
last value written to the bit, or may return 0 always. 


[= fame 
Floating-point excepiion mode 1 (See Table 5-5.) 
24 Reserved. This bit corresponds to the AL bit of the POWER Architecture. 


Exception prefix. The setting of this bit specifies whether an exception vector offset is 
prepended with Fs or Os. In the following description, nnnnnis the offset of the exception. See 


Table 5-7. 
0 Exceptions are vectored to the physical address x'000n_nnnn’. 
1. Exceptions are vectored to the physical address x'FFFn_nnnn’. 


Instruction address translation 

0 Instruction address translation is off. When instruction relocation is off, EAis interpreted 
as described in Chapter 6, “Memory Management Unit.” 

1 __ Instruction address translation is enabled. 


Data address translation 

0 Data address translation is off. When data relocation is off, EAis interpreted as 
described in Chapter 6, “Memory Management Unit.” 

1. Data address translation is enabled. 


En 


“These reserved bits may be used by other PowerPC processors. Attempting to change these bits does 
not affect the operation of the processor. These bit positions always return a zero value when read. 





The floating-point exception mode bits are interpreted as shown in Table 5-S. For further 
details see Section 5.4.7.1, “Floating-Point Enabled Program Exceptions.” 


Table 5-5. Floating-Point Exception Mode Bits 


Ea Floating-point exceptions disabled 
inn Floating-point imprecise nonrecoverable 











Floating-point imprecise recoverable 


Floating-point precise mode 
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MSR bits 16-31 are guaranteed to be written to SRR1 when the first instruction of the 
exception handler is encountered. 


The data address register (DAR) is a 32-bit register used by several exceptions (data access, 
I/O controller interface error, and alignment) to identify the address of a memory element. 


5.2.1 Enabling and Disabling Exceptions 


When a condition exists that causes an exception to be generated, it must be determined 
whether the exception is enabled for that condition. 


¢ Floating-point enabled exceptions (a type of program exception) can be disabled by 
clearing both MSR[FEO] and MSR[FE1]. If either or both of these bits are set, all 
floating-point exceptions are taken and cause a program exception. Other PowerPC 
processors may support imprecise floating-point exceptions. Individual conditions 
that can generate floating-point exceptions can be enabled and disabled with bits in 
the FPSCR register. 


¢ Asynchronous, precise exceptions are enabled by setting the MSR[EE] bit. When 
MSR[EE]=0, recognition of these exception conditions is delayed. MSR[EE] is 
cleared automatically when an exception is taken to delay recognition of conditions 
causing those exceptions. 


¢ Amachine check exception can only occur if the machine check enable bit, 
MSR[ME], is set. If MSR[ME] is cleared, the processor goes directly into checkstop 
state when a machine-check exception condition occurs. 

* Therun mode exception, which is used to set an instruction breakpoint, can be 
enabled and disabled using bits 8 and 9 of HID1 (HID1[RM]). 

¢ The data address breakpoint can be enabled and disabled using bits 30 and 31 of the 
DABR (HIDS[SA)). 


¢ System reset exceptions cannot be masked. 


5.2.2 Steps for Exception Processing 


After it is determined that the exception can be taken (by confirming that any instruction- 
caused exceptions occurring earlier in the instruction stream have been handled, and by 
confirming that the exception is enabled for the exception condition), the MPC601 does the 
following: 


1. The machine status save/restore register 0 (SRRO) is loaded with an instruction 
address that depends on the type of exception. See the individual exception 
description for details about how this register is used for specific exceptions. 

2. Bits 0-15 of SRR1 are loaded with 16 bits of information specific to the exception 
type. 

3. Bits 16-31 of SRR1 are loaded with a copy of bits 16-31 of the MSR. 
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4. The MSR is set as described in Table 5-4. The new values take effect beginning with 


the fetching of the first instruction of the exception-handler routine located at the 
exception vector address. 


Note that MSR[IT] and MSR[DT] are cleared for all exception types; therefore, 
address translation is disabled for both instruction fetches and data access beginning 
with the first instruction of the exception-handler routine. 


Instruction fetch and execution resumes, using the new MSR value, at a location 
specific to the exception type. The location is determined by adding the exception's 
vector (see Table 5-7) to the base address determined by MSR[EP]. If EP is cleared, 
exceptions are vectored to the physical address x’000n_nnnn’. If EP is set, 
exceptions are vectored to the physical address x’FFFn_nnnn’. For a machine check 
exception that occurs when MSR[ME]=0 (machine check exceptions are disabled), 
the checkstop state is entered (the machine stops executing instructions). See 
Section 5.4.2, “Machine Check Exception (x’00200’).” 


The Iwarx and stwx instructions require special handling if a reservation is still set 
when an exception occurs. Exceptions clear reservations set with lwarx (or Idarx). 


5.2.3 Returning from Supervisor Mode 


The Return from Interrupt (rfi) instruction performs context synchronization by allowing 
previously issued instructions to complete before returning to user mode. Execution of the 
rfi instruction ensures the following: 


All previous instructions have completed to a point where they can no longer cause 
an exception. If a prior memory access causes an I/O controller interface error 
exception, the results must be determined before this instruction is executed. 


Previous instructions complete execution in the context (privilege, protection, and 
address translation) under which they were issued. 


The instructions following this instruction execute in the context established by this 
instruction. 


5.3 Process Switching 


The operating system should execute the following when processes are switched: 


5-14 


The sync instruction, to resolve any data dependencies between the processes and to 
synchronize the use of segment registers and SPRs. For an example showing use of 
the sync instruction, see Section 2.3.3.1, “Synchronization for Supervisor-Level 
SPRs, and Segment Registers.” 


The isynce instruction, to ensure that undispatched instructions not in the new 
process are not used by the new process 


The stwex. instruction, to clear any outstanding reservations, which ensures that an 
Iwarx instruction in the old process is not paired with an stwex. in the new process. 
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Note that if an exception handler is used to emulate an instruction that is not implemented 
in the MPC60)1, the exception handler must report in SRRO (and in the data address register 
[DAR] if applicable) the EA computed by the instruction being emulated and not one used 
to emulate the instruction being emulated. 


5.4 Exception Definitions 


Table 5-6 shows all the types of exceptions that can occur with the MPC601 and the MSR 
bit settings when the processor transitions to supervisor mode. The state of these bits prior 
to the exception is typically stored in SRR1. 


Table 5-6. MSR Setting Due to Exception 





ee...) 
ver [E]RTETSTe Ts PETE T Eee 
Type 16 17 18 19 20 21 23 25 26 27 31 
fees fe fepet-{>t>[»|-pepepe. 
(=> RS SSR DRE SNS 
cues fo fepe{-[e[efe}-Te][o. 


os 
access 


| External | 


pees Pete te tate pete tate 
ae oe eee ee eee ee ae 


Floating-point 
unavailable 


(noni Ke ASSEN RS Ae COS RAR 
























System call 







Run mode 
exception 


VO controller 
interface error 
exception 


0 Bitis cleared 

1 Bit is set 

— Bitis not altered 

Reserved bits are read as if written as 0. 






The setting of the exception prefix (EP) bit in the MSR determines how exceptions are 
vectored. If the bit is cleared, exceptions are vectored to the physical address x’000n_nnnn’ 

(where nnnnn is the vector offset); if EP is set, exceptions are vectored to the physical 
address x’FFFn_nnnn’. Table 5-7 shows the exception vector offset of the first instruction 
of the exception handler routine for each exception type. 
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Table 5-7. Exception Vector Offset Table 
Vector Offset 


(hex) Exception Type 


00B00 Reserved. Note that other PowerPC processors may use this as a 
vector for the trace exception. 


00C00 System call 


00D00 Reserved. Other PowerPC processors may use this as a vector for 
the trace exception. 


00E00 Reserved. Other PowerPC processors may use this vector for 
floating-point assist exceptions. 


implementation-specific exceptions. 


5.4.1 Reset Exceptions (x’00100’) 


The system reset exception is a nonmaskable, asynchronous exception signaled to the 
MPC601 either through the assertion of either of the reset signals (GSRESET or HRESET). 
The assertion of the soft reset signal, SRESET, as described in Section 8.2.9.4.2, “Soft 










‘Reset (SSRESET)—Input,” causes the soft reset exception to be taken and the physical base 


address of the handler is determined by the MSR[EP] bit. The assertion of the hard reset 
signal, HRESET, as described in Section 8.2.9.4.1, “Hard Reset (HRESET)—Input,” 
causes the hard reset exception to be taken and the physical address of the handler is always 
x’ FFFO 0100’ 
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5.4.1.1 Soft Reset 


Soft reset exceptions are imprecise—they break the instruction pipeline to handle the 
exception. As a result, the MPC601 does not support restarting the interrupted process; 
although it attempts to save the processor state in order to perform diagnostic operations. 
When a soft reset exception occurs, registers are set as shown in Table 5-8. 


Table 5-8. Soft Reset Exception—Register Settings 
Setting Description 
Set to the effective address of the instruction that the processor would have attempted to execute next 
if no exception conditions were present. 


SRR1 0-15 Cleared 
16-31 Loaded from bits 16-31 of the MSR. Note that if the processor state is corrupted to the extent 


that execution cannot be reliably restarted, SRR1[30] is cleared. 


When a soft reset exception is taken, instruction execution resumes at offset x’00100’ from 
the physical base address indicated by MSR[EP]. 















Before returning to the main program, the exception handler should do the following: 


{. SRRO and SRRI should be given the values used by the rfi instruction. 
2. Execute rfi. 


It is not guaranteed that execution is recoverable. Typically, the processor is recoverable in 
a limited sense, if at all. This allows the use of diagnostic aids such as the ESP interface to 
determine system problems. 


5.4.1.2 Hard Reset 


This section describes the MPC601’s reset state after performing a hard reset operation 
(asserting HRESET as described in Section 8.2.9.4.1, “Hard Reset (HRESET)—Input,”). 
Note that a hard reset operation should be performed on power-on to appropriately reset the 
processor. Table 5-9 shows the state of the machine just before it fetches the first instruction 
after a hard reset. Because of the setting of the MSR[EP] bit caused by a hard reset, the first 
instruction is fetched from address x’ FFFO 0100’. 
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Table 5-9. Settings Caused by Hard Reset 


FPRs All Os 


FPSCR 00000000 










Condition register All Os 
All Os 


Link register 
BAT registers 
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Segment registers 
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Table 5-9. Settings Caused by Hard Reset (Continued) 


Tag directory All Os. (However, the LRU 


bits are initialized such 
that each side of the 
cache has a unique LRU 
value.) 





' Early releases (DD1) of the MPC601 hardware set this 
to x’00010000'. Other versions of silicon may be 
different (see Section 2.3.3.10, “Processor Version 
Register (PVR)” for setting information). 

2 Master checkstop enable on, sequencer GPR self-test 
checkstop invalid microcode instruction checkstop on. 
3 Note that if external clock is connected to RTC for the 
MPC601, then the RTCL, RTCU, and DEC can change 
from their initial value of Os without receiving 
instructions to load those registers. 


The following 1s also true after a hard reset operation: 


¢ External checkstops are enabled. 


¢ The on-chip COP has given control of the PIs/POs to the rest of the chip for 
functional use. 


¢ Since the reset exception has data and instruction translation disabled (MSR[DT] 
and MSR{IT] both cleared), the chip operates in direct address translation mode. 
This implies that instruction fetches as well as loads and stores are cacheable. 
(Operations that correspond to direct address translations are implicitly cacheable, 
not write-through mode, and require coherency checking on the bus). 


¢ All internal arrays and registers are cleared during the hard reset process. 


5.4.2 Machine Check Exception (x’00200’) 


The MPC601 conditionally initiates a machine-check exception after detecting the 
assertion of the TEA signal on the MPC601 interface. The assertion of the TEA signal 
indicates that a bus error occurred and the system terminates the current transaction. One 
clock cycle after TEA is asserted, the data bus signals go to the high-impedance state; 
however, data entering the GPR or the cache is not invalidated. 


If the MSR[ME] bit is set, the exception is recognized and handled; otherwise, the MPC6()1 
generates an internal checkstop condition. This may not lead to a true checkstop depending 
upon the state of the various checkstop enable control bits in the HIDO register. These are 
shown in Table 5-10. 


The checkstop sources and enables register (HIDO) is a supervisor-level register that 
defines enable and monitor bits for each of the checkstop sources in the MPC6()1. For 
debugging, HIDO[EM] (bit 16) can be cleared to disable the machine-check checkstop 
state. The HIDO register is described in Section 2.3.3.12.1, “Checkstop Sources and 
Enables Register—HID0.” 
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In general, it is expected that the TEA signal would be used by a memory controller to 
indicate a memory parity error or an uncorrectable memory ECC error. Note that the 
resulting machine check exception is imprecise and has priority over any exceptions caused 
by the instruction that generated the bus operation. 


Machine check exceptions are enabled when MSR[ME]=1; this is described in 
Section 5.4.2.1, “Machine Check Exception Enabled (MSR[ME] = 1).” If MSR[ME]=0 
and a machine check occurs, the processor enters the checkstop state. Checkstop state is 
described in 5.4.2.2, “Checkstop State (MSR[ME] = 0).” 


5.4.2.1 Machine Check Exception Enabled (MSR[ME] = 1) 


When a machine check exception is taken, registers are updated as shown in Table 5-10. 


Table 5-10. Machine Check Exception—Register Settings 


Set to the address of the next instruction that would have been executed in the interrupted 
instruction stream. Neither this instruction nor any others beyond it will have been executed. All 
preceding instructions will have been completed. 


0-15 Cleared 
16-31 Loaded from MSR[16-31]. Note that if the processor state is corrupted to the extent that 
execution cannot be reliably restarted, SRR1[30] is cleared. 


EE 0 

PR 0 

FP1 0 

ME 0 

Note that when a machine check exception is taken, the exception handler should set MSR[ME} 
as soon as itis practical to handle another TEA assertion. Otherwise, subsequent TEA assertions 
cause the processor to automatically enter the checkstop state. 

FEO 0 

SE 0 

FE1 0 

EP Value is not altered 

IT 0 

DT 0 





The machine check exception is almost always unrecoverable in the sense that execution 
cannot resume in the same context that existed before the exception. If the condition that 
caused the machine check does not otherwise prevent continued execution, MSR[ME] is 
set to allow the MPC6()1 to continue execution at the machine check exception vector 
address, x’00200’. Typically this record does not allow earlier processes to resume; 
however, the operating systems can then use the machine check exception handler to try to 
identify and log the cause of the machine check condition. 


When a machine check exception is taken, instruction execution resumes at offset x’00200’ 
from the physical base address indicated by MSR[EP]. 
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Before returning to the main program, the exception handler should do the following: 


1. SRRO and SRR1 should be given the values to be used by the rfi instruction. 
2. Execute rfi. 


5.4.2.2 Checkstop State (MSR[ME] = 0) 

When a processor is in the checkstop state, instruction processing is suspended and 
generally cannot be restarted without resetting the processor. The contents of all latches 
(except any associated with the bus clock) are frozen within two cycles upon entering 
checkstop state so that the state of the processor can be analyzed through the use of the ESP 
interface as an aid in problem determination. 


A machine check exception may result from referencing a nonexistent physical address, 
either directly (with MSR[DR]=0), or through an invalid translation. On such a system, for 
example, execution of a Data Cache Block Set to Zero (dcbz) instruction that introduces a 
block into the cache associated with a nonexistent physical address may delay the machine 
check exception until an attempt is made to store that block to main memory. 


Note that not all PowerPC processors provide the same level of error checking. The reasons 
a processor can enter checkstop state is implementation-dependent. 


5.4.3 Data Access Exception (x’00300’) 


A data access exception occurs when no higher priority exception exists and a data memory 
access cannot be performed. The condition that caused the data access exception can be 
determined by reading the DAE/source instruction service register (DSISR), a supervisor- 
level SPR (SPR18) that can be read by using the mfspr instruction. Bit settings are 
provided in Table 5-11. Table 5-11 also indicates which memory element is saved to the 
DAR. Data access exceptions can occur for any of the following reasons: 


¢ The effective address cannot be translated. That is, there is a page fault for this 
portion of the translation, so a data access exception must be taken to retrieve the 
translation from a storage device such as a hard disk drive. 


¢ The instruction is not supported for the type of memory addressed. Invalid 
instructions are described in Section 3.1.1.1, “Invalid Instruction Forms.” I/O 
controller interface segments are described in Section 9.6, “I/O Controller Interface 
Operation.” 


¢ The access violates memory protection. Access is not permitted by the key (Ks and 
Ku) and PP bits, which are set in the segment register and PTE for page protection 
and in the BATs for block protection. 


e The execution of an eciwx or ecowx instruction is disallowed pense the external 
access register enable bit (EAR[E]) is cleared. 
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These scenarios are common among all PowerPC processors. The following additional 
scenarios can cause a data access exception in the MPC601: 


¢ Anlwarx, stwex., or Iscbx instruction refers to anon—memory-forced I/O controller 
interface segment (that is, when the SR[T] = 1 and BUID # x’07P’). 


¢ Aneffective address matches the address in the data-address breakpoint register 
(DABR) while in one of the appropriate compare modes. For additional information 
on the DABR and the compare modes, refer to Section 2.3.3.12.4, “Data Address 
Breakpoint Register (DABR)—HIDS." 


Data access exceptions can be generated by load/store instructions, and the cache control 
instructions (dcbi, dcbz, dcbst, and dcbf). 


Although the MPC601 does not generally support memory accesses that cross a page 
boundary, load or store multiple as well as load or store string instructions that are word- 
aligned and cross a page boundary are handled. In these cases, if the second page has a 
translation error or protection violation associated with it, the MPC601 takes the data 
access exception in the middle of the instruction. In this case, the data address register 
(DAR) always points to the first byte address of the offending page. 


If an stwex. instruction has an effective address for which a normal store operation would 
cause a data access exception but the processor does not have the reservation from Iwarx, 
the MPC601 determines whether a data access exception occurs as follows: 


¢ If the reservation bit is cleared before the stwex. instruction executes, that 
instruction cannot generate an exception, regardless of whether the address 
translation would have failed, page protection would have been violated, or the 
address matches one in the DABR. 


¢ Adata access exception is taken if there is an address translation or page protection 
error, or if the address hits in the DABR as long as the reservation bit is set when the 
stwex. instruction begins execution. In particular, the exception is taken even if the 
reservation bit is cleared after execution begins. 


If the XER indicates that a load or store multiple instruction has a length of zero, a data 


_access exception does not occur, regardless of the effective address. The condition that 


caused the exception is defined in the DSISR. These conditions also use the data address 
register (DAR) as shown in Table 5-11. 


Table 5-11 shows the register settings for data access exceptions. 
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Table 5-11. Data Access Exception—Register Settings 


Setting Description 
SRRO Set to the effective address of the instruction that caused the exception. 


SRR1 0-15 Cleared 
16-31 Loaded from bits 16-31 of the MSR 












i : Ae net is not altered 

FEO O SE 0 

FE1 0O EP Value is not altered 
0 0 


Reserved on the MPC601. PowerPC architecture uses this bit for I/O controller interface error 
exceptions, which are vectored to x’00A00’ on the MPC601. 
1 Set if the translation of an attempted access is not found in the primary hash table entry group 





















(HTEG), or in the renashed secondary HTEG, or in the range of a BAT register; otherwise 
cleared. 

2-3 Cleared 

4 Set if a memory access is not permitted by the page or BAT protection mechanism; otherwise 
cleared. 

5 Set if the lwarx, stwex., or Iscbx instruction is attempted to I/O controller interface space. 

6 Set for a store operation and cleared for a load operation. 

7-8 Cleared 

9 Set if an EA matches the address in the DABR while in one of the three compare modes. 


10 Setif the segment table search fails to find a translation for the EA, otherwise cleared. 
11 Set if the instruction was an eciwx or ecowx and EAR[E] = 
12-31 Cleared 


Set to the effective address of a memory element as described in the following list: 

+ Abyte in the first word accessed in the page that caused the data access exception, for a byte, half 
word, or word memory access. 

* Abyte in the first double word accessed in the page that caused the data access exception, fora 

double-word memory access. 





When a data access exception is taken, instruction execution resumes at offset x’00300’ 
from the physical base address indicated by MSR[EP]. 


The architecture permits certain instructions to be partially executed when they cause a data 
access exception. These are as follows: 


¢ Load multiple or load string instructions—Some registers in the range of registers 
to be loaded may have been loaded. 


¢ Store multiple or store string instructions—Some bytes of memory in the range 
addressed may have been updated. 


In the cases above, the questions of how many registers and how much memory is altered 
are instruction- and boundary-dependent. However, memory protection is not violated. 
Furthermore, if some of the data accessed is in I/O controller interface space (SR[T]=1), 
and the instruction is not supported for I/O controller interface accesses, the locations in 
I/O controller interface space are not accessed. 
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To preserve the ability to restart, partial execution is not allowed for non-multiple and non- 
string integer load operations and the target register is not altered. For update forms, the 
update register (rA) is not altered. . 


5.4.4 Instruction Access Exception (x’00400’) 


An instruction access exception occurs when no higher priority exception exists and an 
attempt to fetch the next instruction to be executed cannot be performed for any of the 
following reasons: 


¢ The effective address cannot be translated. That is, there is a page fault for this 
portion of the translation, so an instruction access exception must be taken to 
retrieve the translation from a storage device such as a hard disk drive. 


¢ The fetch access is to an I/O controller interface segment that is not memory-forced. 


¢ The fetch access violates memory protection. Access is not permitted by the K and 
PP bits, which are set in the segment register and PTE for page protection and in the 
BATs for block protection. 


An instruction fetch to an I/O controller interface segment while MSR[IT] is set causes an 
instruction access exception on the MPC601. Register settings for instruction access 
exceptions are shown in Table 5-12. 


Table 5-12. Instruction Access Exception—Register Settings 


rd 


Set to the effective address of the instruction that the processor would have attempted to execute next 
if no exception conditions were present (if the exception occurs on attempting to fetch a branch target, 
SRRO is set to the branch target address). 


0 Cleared 

1 Set if the translation of an attempted access is not found in the primary hash table entry group 
(HTEG), or in the rehashed secondary HTEG, or in the range of an BAT register; otherwise 
cleared. 
Cleared 
Set if the fetch access was to an I/O controller interface segment (SR[T}=1); otherwise cleared. 
Set if a memory access is not permitted by the page or BAT protection mechanism, described in 
Chapter 6; otherwise cleared. 
Cleared 
Set if the segment table search fails to find a translation for the effective address; otherwise 


cleared. 
11-15 Cleared 
16-31 Loaded from bits 16-31 of the MSR 


EE 0 : SE 0 
PR O FE1 0 
FP1 0O EP Value is not altered 
ME Value is not altered IT 0 
FEO O DT 0 





When an instruction access exception is taken, instruction execution resumes at offset 
x’00400’ from the physical base address indicated by MSR[EP]. 
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5.4.5 External Interrupt (x’00500’) 


An external interrupt is signaled to the MPC601 by the assertion of the INT signal as 
described in Section 8.2.9.1, “Interrupt (INT)—Input.” The interrupt may be delayed by 
other higher priority exceptions or if the MSR[EE] bit is cleared when the exception occurs. 


After the pulse is detected, the MPC601 stops dispatching instructions and waits for 
pending instructions to complete. Therefore, exceptions caused by instructions in progress 
are taken before the external interrupt exception is taken. After all instructions complete, 
the MPC601 takes the external interrupt exception. 


The register settings for the external interrupt exception are shown in Table 5-13. 


| Table 5-13. External Interrupt—Register Settings 
Setting Description 
SRRO Set to the effective address of the instruction that the processor would have attempted to execute next 
if no interrupt conditions were present. 
SRR1 0-15 Cleared rs 
16-31 Loaded from bits 16-31 of the MSR 
EE 0 SE 0 





PR 0O FE1 0 
FP1 0 EP Value is not altered 
ME ‘Value is not altered IT 0 
FEO 0O DT O 


When an external interrupt exception is taken, instruction execution resumes at offset 
x’00500’ from the physical base address indicated by MSR[EP]}. 


5.4.6 Alignment Exception (x’00600’) 


This section describes conditions that can cause alignment exceptions in the MPC601. 
Similar to data access exceptions, alignment exceptions use the SRRO and SRR1 to save 
the machine state and the DSISR to determine the source of the exception. 


The register settings for alignment exceptions are shown in Table 5-14. 
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Table 5-14. Alignment Exception—Register Settings 


Setting Description 
SRRO Set to the effective address of the instruction that caused the exception. 


SRR1 0-15 Cleared 
16-31 Loaded from bits 16-31 of the MSR 


ae : ay 
FP1 0 EP Value is not altered 
ME Value is not altered IT 0 
FEO 0O DT 0O 


0-11 Cleared 
12-13 Cleared. (Note that these bits can be set by several 64-bit PowerPC 
instructions that are not supported in the MPC601) 
14 Cleared 
15-16 For instructions that use register indirect with index addressing—set 
to bits 29-30 of the instruction 
For instructions that use register indirect with immediate index 
addressing—cleared 
For instructions that use register indirect with index addressing—Set 
to bit 25 of the instruction 
For instructions that use register indirect with immediate index 
addressing— Set to bit 5 of the instruction 
For instructions that use register indirect with index addressing—Set 
to bits 21-24 of the instruction 
For instructions that use register indirect with immediate index 
addressing—set to bits 1—4 of the instruction 
22-26 Set to bits 6-10 (source or destination) of the instruction. Undefined 
for dcbz 
27-31 Set to bits 11-15 of the instruction (rA) 
Set to either bits 11-15 of the instruction or to any register number not 
in the range of registers loaded by a valid form instruction, for Imw, 
Iswi, and Iswx instructions. Otherwise undefined 
Note that for load or store instructions that use register indirect with index 
addressing, the DSISR can be set to the same value that would have 
resulted if the corresponding instruction uses register indirect with immediate 
index addressing had caused the exception. Similarly, for load or store 
instructions that use register indirect with immediate index addressing, 
DSISR can hold a value that would have resulted from an instruction that 
uses register indirect with index addressing. For example, an unaligned lwax 
instruction that crosses a protection boundary would normally cause the 
DSISR to be set to the following binary value: 
000000000000 00 0 01 0101 ttttt 27777 


If there is no corresponding instruction, no alternative value can be specified. 





5.4.6.1 Integer Alignment Exceptions 


The MPC601 is optimized for load and store operations that are aligned on natural 
boundaries. Operations that are not naturally aligned may suffer performance degradation, 
depending on the type of operation, the boundaries crossed, and the mode that the processor 
is in during execution. More specifically, these operations may either cause an alignment 
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exception or they may cause the processor to break the memory access into multiple, 
smaller accesses with respect to the cache and the memory subsystem. 


The MPC601 can initiate alignment exception for the accesses as shown in Table 5-15. In 
all of these cases, the appropriate range check is performed before the instruction begins 
execution. As a result, if an alignment exception is taken, it is guaranteed that no portion of 
the instruction has been executed. 


Table 5-15. Access Types 


eC 


5.4.6.1.1 Direct-Translation Access 

A direct-translation access occurs when both MSR[DT] and SR[T] are cleared. If a 
256-Mbyte boundary is crossed by any portion of the memory being accessed by an 
instruction (including string/multiples), an alignment exception is taken. 






5.4.6.1.2 1/O Controller Interface Access 

An I/O controller interface access occurs when a data access is initiated, SR[T] is set, and 
SR[BUID] is not equal to x'07F’. In the MPC601 (but not for the general PowerPC 
processor case), MSR[DT] is a don't care for this case. The following apply for I/O 
controller interface accesses: 


* If a256-Mbyte boundary will be crossed by any portion of the I/O controller 
interface space accessed by an instruction (the entire string for strings/multiples), an 
alignment exception is taken. 

¢ Floating-point loads and stores to I/O controller interface segments always cause an 
alignment exception, regardless of operand alignment. 

¢ The lwarx/stwex./scbx instructions that map into an I/O controller interface 


segment always cause a data access exception (not an alignment exception), 
regardless of operand alignment. 


Note that other I/O controller interface errors may generate an I/O controller interface error 
exception, as described in Section 5.4.10, “I/O Controller Interface Error Exception 
(x’00A00’).” 


5.4.6.1.3 Memory-Forced I/O Controller Interface Access 


A memory-forced I/O controller interface access occurs when SR[T] is set, and SR[BUID] 
is x')7F' in the MPC601 (not defined as part of the PowerPC architecture). MSR[DT] is a 
don't care for this case. 
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If a 256-Mbyte boundary is crossed by any portion of the memory being accessed by an 
instruction (including string/multiples), an alignment exception is taken. 


Note that floating-point instructions and Iwarx, stwex., and Iscbx instructions are handled 
as the page- and block-address translation cases for the memory-forced I/O controller 
interface segments. Memory-forced I/O controller interface operations do not cause special 
cases of alignment or data access exceptions. 


5.4.6.1.4 Page Address Translation Access 


A page-address translation access occurs when MSR[DT] is set, SR[T] is cleared and there 
is not a BTLB match. Note the following points: 


¢ The following is true for all loads and stores except strings/multiples: 
— An alignment exception is taken if the operand spans a 4-Kbyte boundary. 
— Byte operands never cause an alignment exception. 
— Half-word operands cause an alignment exception if the EA ends in x'FFF’. 
— Word operands cause an alignment exception if the EA ends in x'FFD—FFF'. 


— Double-word operands cause an alignment exception if the EA ends in x'FF9- 
FFF’. 
¢ The Iscbx instruction causes an alignment exception if any portion of the entire 
string crosses into the next 4-Kbyte page of memory. This is taken regardless of the 
starting address, even if the Iscbx operand starts on a word boundary. 


¢ All other string/multiple instructions (except Iscbx) take alignment exceptions as 
follows: 


— If the string/multiple starts on a word boundary and a 256-Mbyte boundary is 
crossed by any portion of the entire string/multiple, an alignment exception is 
taken. Note that it must be a 256-Mbyte crossing—a simple 4-Kbyte crossing 
does not cause an exception for a word-aligned string/multiple operation. 


— If any portion of the string/multiple will cross into the next 4-Kbyte page of 
memory, an alignment exception is taken. 


¢ The desz instruction causes an alignment exception if the access is to a page or block 
with the W (write-through) or I (cache-inhibit) bit set in the UTLB or BTLB, 
respectively. 


Note that the above summary indicates that a 256-Mbyte crossing always causes an 
alignment exception. This includes accesses of all four types regardless of alignment. Of 
course, non-string/multiple load and store operations can only cross this boundary if they 
are not aligned. 


Misaligned memory accesses that do not cause an alignment exception may not perform as 
well as an aligned access of the same type. In general, the IU is designed to efficiently 
handle memory access quantities of eight bytes or fewer that lie within a double-word 
boundary. Internally, all integer memory access instructions that involve more than four 
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bytes of data are broken into multiple access of four bytes or fewer. Floating-point memory 
access instructions always involve either four or eight bytes of data. Any memory access 

_ that crosses a double-word boundary is further broken into two smaller accesses that do not 
cross the double-word boundary. For multiple-word and string operations, the MPC6(0)1 
does not force alignment to reduce the number of accesses. 


The resulting performance degradation due to misaligned accesses depends on how well 
each individual access behaves with respect to the memory hierarchy. At a minimum, 
additional cache access cycles are required that can delay other processor resources from 
using the cache. More dramatically, for an access to a noncacheable page, each discrete 
access involves individual MPC601 bus operations that reduce the effective bandwidth of 
that bus. 


Finally, note that when the MPC601 is in page address translation mode, there is no special 
handling for accesses that fall into BAT regions. If one of the 4-Kbyte crossing conditions 
indicated above happens to be completely contained within a BAT register, the MPC601 
still takes the alignment exception. 


5.4.6.2 Floating-Point Alignment Exceptions 
An alignment exception occurs when no higher priority exception exists and the MPC601 
cannot perform a memory access for one of the following reasons: 


¢ The operand of a floating-point load or store operation is in a non—memory-forced 
1/O controller interface segment (SR[T]=1). 


¢ The operand of a load or store crosses a 4-Kbyte boundary if MSR[DT] is set or if 
the operand crosses a 256-Mbyte boundary if MSR[DT] is cleared. 


5.4.6.3 Little-Endian Mode Alignment Exceptions 

In little-endian mode, any operand that is not properly aligned (as described in 
Section 2.4.6, “PowerPC Data Memory with LM Set”), causes an alignment exception. 
Additionally, any attempted execution of the string/multiple instructions causes an 
alignment exception. 


5.4.6.4 Interpretation of the DSISR as Set by an Alignment Exception 
For most alignment exceptions, an exception handler may be designed to emulate the 
instruction that causes the exception. To do this, it needs the following characteristics of the 
‘instruction: 


¢ Load or store 

¢ Length (half word, word, or double word) 

¢ . String, multiple, or normal load/store 

¢ Integer or floating-point 

¢ Whether the instruction performs update 

¢ Whether the instruction performs byte reversal 
¢ Whether it is a dcbz instruction 
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The PowerPC architecture provides this information implicitly, by setting opcode bits in the 
DSISR that identify the excepting instruction type. The exception handler does not need to 
load the excepting instruction from memory. The mapping for all exception possibilities is 
unique except for the few exceptions discussed below. 


Table 5-16 shows the inverse mapping—how the DSISR bits identify the instruction that 
caused the exception. 


The alignment exception handler cannot distinguish a floating-point load or store that 
causes an exception because it is misaligned, or because it addresses the I/O controller 
interface space. However, this does not matter; in either case it is emulated with integer 
instructions. 


Table 5-16. DSISR(15—21) Settings to Determine Misaligned Instruction 







00 0 0000 Iwarx, lwz, proprietary | 
oo 
jm 
Lc 





00 1 0110 
00 1 O111 


00 1 1000 
00 1 1001 
00 1 1010 
00 1 1011 
0100101 
01 0 1000 
010 1001 
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Table 5-16. DSISR(15—21) Settings to Determine Misaligned Instruction (Continued) 


ce 
Ce 
ee 
[asin Ye 
ee 
ee 




























' The instructions Ilwz and Iwarx give the same DSISR bits (all zero). But if lwarx causes an 
alignment exception, it is an invalid form, so it need not be emulated in any precise way. It is 
adequate for the alignment exception handler to simply emulate the instruction as if it were an 
Iwz. It is important that the emulator use the address in the DAR, rather than computing it from 
rA/rB/D, because Iwz and Iwarx use different addressing modes. 
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5.4.7 Program Exception (x’00700’) 


A program exception occurs when no higher priority exception exists oan one or more of 
the following exception conditions, which correspond to bit settings in SRR1, occur during 
execution of an instruction: 


System floating-point enabled exception—A system floating-point enabled 
exception is generated when the following condition is met: 


(MSR[FEO] | MSR[FE1]) & FPSCR[FEX] is 1. 


FPSCR[FEX] is set by the execution of a floating-point instruction that causes an 
enabled exception or by the execution of a “move to FPSCR” type instruction that 
sets an exception bit when its corresponding enable bit is set. In the MPC601, all 
floating-point enabled exceptions are handled in a precise manner. As a result, all 
program exceptions taken on behalf of a floating-point enabled exception clear 
SRR1[15] to indicate that the address in SRRO points to the instruction that caused 
the exception. For more information, refer to Section 5.4.7.1, “Floating-Point 
Enabled Program Exceptions.” 


Illegal instruction—An illegal instruction program exception is generated when 
execution of an instruction is attempted with an illegal opcode or illegal 
combination of opcode and extended opcode fields, or when execution of an 
optional instruction not provided in the MPC601 is attempted (these do not include 
those optional instructions, such as Instruction Cache Block Invalidate, icbi, that are 
treated as no-ops). 

Privileged instruction—A privileged instruction type program exception is 
generated when the execution of a privileged instruction is attempted and the MSR 
register privileged bit, MSR[PR], is set. Some implementations may generate this 
exception for mtspr or mfspr with an invalid SPR field if spro=1 and MSR[PR]=1. 


Trap—A trap type program exception is generated when any of the conditions 
specified in a trap instruction is met. Trap instructions are described in Chapter 3, 
“Addressing Modes and Instruction Set Summary.” 


Illegal operations—The MPC601 takes illegal operation program exceptions for 
unimplemented PowerPC instructions. 


Note that instructions using an invalid instruction form do not take a program exception, 
but instead cause results that are boundedly undefined. The register settings are shown in 
Table 5-17. 


5-32 
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Table 5-17. Program Exception—Register Settings 


Setting Description 
SRRO Contains the effective address of the excepting instruction 


SRRI1 0-10 Cleared 
Set for a floating-point enabled program exception; otherwise cleared. 
Set for an illegal instruction program exception; otherwise cleared. 
Set for a privileged instruction program exception; otherwise cleared. 
Set for a trap program exception; otherwise cleared. 
Cleared if SRRO contains the address of the instruction causing the exception, and set if SRRO 


contains the address of a subsequent instruction. 
16-31 Loaded from bits 16-31 of the MSR. 
Note that only one of bits 11-14 can be set. 


MSR 0 
Value is not altered 
0 
Value is not altered 
0 


When a program exception is taken, instruction execution resumes at offset x’00700’ from 
the physical base address indicated by MSR[EP]. 





5.4.7.1 Floating-Point Enabled Program Exceptions 

In the MPC601, floating-point exceptions are signaled by condition bits set in the floating- 
point status and control register (FPSCR). They can cause the system floating-point enabled 
exception error handler to be invoked. All floating-point exceptions are handled precisely. 
The FPSCR is shown in Figure 5-5. 


FPSCR 


Reserved 









VXIDI VXSOFT 
VXxISsI VXSQRT 
VXSNAN VXCVI 


Belk CE eae rear | | 1 | [vejoejuelze|xe} | | 


2345 67 8 9101112 131415 192021 22 23 24 2526 27 28 29 30 31 
Figure 5-5. Floating-Point Status and Control Register 


A listing of FPSCR bit settings is shown in Table 5-18. 
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Table 5-18. FPSCR Bit Settings 


Floating-point exception summary (FX). Every floating-point instruction implicitly sets FPSCR[FX] if that 
instruction causes any of the floating-point exception bits in the FPSCR to transition from 0 to 1. The 
merfs instruction implicitly clears FPSCR[FX] if the FPSCR field containing FPSCRF x is copied. The 
mtfsf, mtfsfi, mtfsb0, and mtfsb1 instructions can set or clear FPSCR[FX] explicitly. This is a sticky bit. 



















Floating-point enabled exception summary (FEX). This bit signals the occurrence of any of the enabled 
exception conditions. It is the logical OR of all the floating-point exception bits masked with their 

respective enables. The merfs instruction implicitly clears FPSCR[FEX] if the result of the logical OR 
described above becomes zero. The mtfsf, mtfsfi, mtfsb0, and mtfsb1 instructions cannot set or clear 
FPSCR[FEX] explicitly. This is not a sticky bit. 



















Floating-point invalid operation exception summary (VX). This bit signals the occurrence of any invalid 
operation exception. It is the logical OR of all of the invalid operation exceptions. The merfs implicitly 
clears FPSCR[VX] if the result of the logical OR described above becomes zero. The mtfsf, mtfsfi, 
mtfsb0, and mtfsb1 instructions cannot set or clear FPSCR[VX] explicitly. This is not a sticky bit. 





Floating-point overflow exception (OX). This is a sticky bit. 





Floating-point underflow exception (UX). This is a sticky bit. 


Pe Floating-point zero divide exception (ZX). This is a sticky bit. 
ca Floating-point inexact exception (XX). This is a sticky bit. 
Floating-point invalid operation exception for SNaN (VXSNAN). This is a sticky bit. 


Ee: Floating-point invalid operation exception for oo-ce (VXISI). This is a sticky bit. 


Floating-point invalid operation exception for c0/eo (VXIDI). This is a sticky bit. 


Floating-point invalid operation exception for 0/0 (VXZDZ). This is a sticky bit. 













Floating-point invalid operation exception for -*0 (VXIMZ). This is a sticky bit. 





Floating-point invalid operation exception for invalid compare (VXVC). This is a sticky bit. 





Floating-point fraction rounded (FR). The last floating-point instruction that potentially rounded the 
intermediate result incremented the fraction. 





Floating-point fraction inexact (Fl). The last floating-point instruction that potentially rounded the 
intermediate result produced an inexact fraction or a disabled exponent overflow. 






Floating-point result flags (FPRF). This field is based on the value placed into the target register even if 
that value is undefined. Refer to Table 2-2 for specific bit settings. 
15 Floating-point result class descriptor (C). Floating-point instructions other than the compare 
instructions may set this bit with the FPCC bits, to indicate the class of the result. 
16-19 Floating-point condition code (FPCC). Floating-point compare instructions always set one of 
’ the FPCC bits to one and the other three FPCC bits to zero. Other floating-point instructions 
may set the FPCC bits with the C bit, to indicate the class of the result. Note that in this case the 
high-order three bits of the FPCC retain their relational significance indicating that the value is 
less than, greater than, or equal to zero. 
16 Floating-point less than or negative (FL or <) 
17 Floating-point greater than or positive (FG or >) 
18 Floating-point equal or zero (FE or. =) 
19 Floating-point unordered or NaN (FU or 7?) 


EN 
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Table 5-18. FPSCR Bit Settings (Continued) 


Floating-point invalid operation exception for software request (VXSOFT). This bit can be altered only by 
the merfs, mtfsfi, mtfsf, mtfsb0, or mtfsb1 instructions. The purpose of VXSOFT is to allow software to 
cause an invalid operation condition for a condition that is not necessarily associated with the execution of 
a floating-point instruction. For example, it might be set by a program that computes a square root if the 

source operand is negative. This is a sticky bit. 
















Floating-point invalid operation exception for invalid square root (VXSQRT). This is a sticky bit. This 
guarantees that software can simulate fsqrt and frsqrte, and to provide a consistent interface to handle 
exceptions caused by square-root operations. 


Floating-point invalid operation exception for invalid integer convert (VXCVI). This is a sticky bit. See 
Section 5.4.7.2, “Invalid Operation Exception Conditions." 


[at | Fesinspiinvaldcpraionecopionerae VE) 

[es | Roaipontovetow ecepienebe(@) 

Floating-point underflow exception enable (UE). This bit should not be used to determine whether 
denormalization should be performed on floating-point stores 


rie Floating-point inexact exception enable (XE) : 
}29 | Reserved. This bit may be implemented as the non-IEEE made bit (NI) in other PowerPC implementations. 


Floating-point rounding control (RN). 
00 Round to nearest 

01 Round toward zero 
Round toward +infinity 
Round toward -infinity 















The following conditions that can cause program exceptions are detected by the processor. 
These conditions may occur during execution of floating-point arithmetic instructions. The 
corresponding bits set in the FPSCR are indicated in parentheses. 


¢ Invalid floating-point operation exception condition (VX) 
— SNaN condition (VXSNAN) 
— Infinity—-infinity condition (VXISI) 
— Infinity/infinity condition (VXIDI) 
— Zero/zero condition (VXZDZ) 
— Infinity*zero condition (VXIMZ) 
— [llegal compare condition (VXVC) 


These exception conditions are described in Section 5.4.7.2, “Invalid Operation 
Exception Conditions.” 


¢ Software request condition (VXSOFT). These exception conditions are described in 
Section 5.4.7.2, “Invalid Operation Exception Conditions.” 


¢ Illegal integer convert condition (VXCVI). These exception conditions are 
described in Section 5.4.7.2, “Invalid Operation Exception Conditions.” 
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¢ Zero divide exception condition (ZX). These exception conditions are described in 
Section 5.4.7.3, “Zero Divide Exception Condition.” 


¢ Overflow Exception Condition (OX). These exception conditions are described in 
Section 5.4.7.4, “Overflow Exception Condition.” 


¢ Underflow Exception Condition (UX). These exception conditions are described in 
Section 5.4.7.5, “Underflow Exception Condition.” 


¢ Inexact Exception Condition (XX). These exception conditions are described in 
Section 5.4.7.6, “Inexact Exception Condition.” 


Each floating-point exception condition and each category of illegal floating-point 
operation exception condition, has a corresponding exception bit in the FPSCR. In addition, 
each floating-point exception has a corresponding enable bit in the FPSCR. The exception 
bit indicates the occurrence of the corresponding condition. If a floating-point exception 
occurs, the corresponding enable bit governs the result produced by the instruction and, in 
conjunction with bits FEO and FE1, whether and how the system floating-point enabled 
exception error handler is invoked. (The “enabling” specified by the enable bit is of 
invoking the system error handler, not of permitting the exception condition to occur. The 
occurrence of an exception condition depends only on the instruction and its inputs, not on 
the setting of any control bits.) 


The floating-point exception summary bit (FX) in the FPSCR is set when any of the 
exception condition bits transitions from a zero to a one or when explicitly set by software. 
The floating-point enabled exception summary bit (FEX) in the FPSCR is set when any of 
the exception condition bits is set and the exception is enabled (enable bit is one). 


A single instruction may set more than one exception condition bit in the following cases: 
¢ The inexact exception condition bit may be set with overflow exception condition. 
¢ The inexact exception condition bit may be set with underflow exception condition. 


¢ The illegal floating-point operation exception condition bit (SNaN) may be set with 
illegal floating-point operation exception condition (°°*()) for multiply-add 
instructions. 


¢ The illegal operation exception condition bit (SNaN) may be set with illegal 
floating-point operation exception condition (illegal compare) for compare ordered 
instructions. 


¢ The illegal floating-point operation exception condition bit (SNaN) may be set with 
illegal floating-point operation exception condition (illegal integer convert) for 
convert to integer instructions. 


When an exception occurs, the instruction execution may be suppressed or a result may be 
delivered, depending on the exception condition. 
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Instruction execution is suppressed for the following kinds of exception conditions, so that 
there is no possibility that one of the operands is lost: 


¢ Enabled illegal floating-point operation 
¢ Enabled zero divide 


For the remaining kinds of exception conditions, a result is generated and written to the 
destination specified by the instruction causing the exception. The result may be a different 
value for the enabled and disabled conditions for some of these exception conditions. The 
kinds of exception conditions that deliver a result are the following: 


¢ Disabled illegal floating-point operation 
¢ Disabled zero divide 

¢ Disabled overflow 

¢ Disabled underflow 

¢ Disabled inexact 

¢ Enabled overflow 

¢ Enabled underflow 

¢« Enabled inexact 


Subsequent sections define each of the floating-point exception conditions and specify the 
action taken when they are detected. 


The IEEE standard specifies the handling of exception conditions in terms of traps and trap 
handlers. In the PowerPC architecture, setting an FPSCR exception enable bit causes 
generation of the result value specified in the IEEE standard for the trap enabled case—the 
expectation is that the exception is detected by software, which will revise the result. An 
FPSCR exception enable bit of 0 causes generation of the default result value specified for 
the trap disabled (or no trap occurs or trap is not implemented) case—the expectation is that 
the exception will not be detected by software, which will simply use the default result. The 
result to be delivered in each case for each exception is described in the following sections. 


The IEEE default behavior when an exception occurs, which is to generate a default value 
and not to notify software, is obtained by clearing all FPSCR exception enable bits and 
using ignore exceptions mode (see Table 5-19). In this case the system floating-point 
enabled exception error handler is not invoked, even if floating-point exceptions occur. If 
necessary, software can inspect the FPSCR exception bits to determine whether exceptions 
have occurred, 


If the program exception handler notifies software that a given exception condition has 
occurred, the corresponding FPSCR exception enable bit must be set and a mode other than 
ignore exceptions mode must be used. In this case the system floating-point enabled 
exception error handler is invoked if an enabled floating-point exception condition occurs. 


Whether and how the system floating-point enabled exception error handler is invoked if 
an enabled floating-point exception occurs is controlled by MSR bits FEO and FEI as 
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shown in Table 5-19. (The system floating-point enabled exception error handler is never 
invoked because of a disabled floating-point exception.) 


Table 5-19. MSR[FEO] and MSR[FE1] Bit Settings 


eel ee 


Ignore exceptions mode—Floating-point exceptions do not cause the program exception error 
handler to be invoked. 


Imprecise nonrecoverable mode—This mode is not applicable to the MPC601. FEO and FE1 or 
ORed, so setting either bit results in running the processor in precise mode. Note that in PowerPC 
processors that support this mode, the system floating-point enabled exception error handler is 
invoked at some point at or beyond the instruction that caused the enabled exception. The state of 
the processor may include conditions and data affected by the exception (that is, hazards are not 
avoided). It may not be possible to identify the excepting instruction or the data that caused the 
exception (that is, the data is not recoverable). 















Imprecise recoverable mode—This mode is not applicable to the MPC601. FEO and FE1 or ORed, so 
setting either bit results in running the processor in precise mode. Note that in PowerPC processors 
that support this mode, the system floating-point enabled exception error handler is invoked at some 
point at or beyond the instruction that caused the enabled exception. Sufficient information is 

provided to the system floating-point enabled exception error handler that it can identify the excepting 
instruction and the operands, and correct the result. All hazards caused by the exception are avoided 
(for example, use of the data that would have been produced by the excepting instruction). 











Precise mode—The system floating-point enabled exception error handler is invoked precisely at the 
instruction that caused the enabled exception. 






Note that in the MPC601, FEO and FE1 are ORed; therefore, unless both FEO and FE] are 
cleared, the MPC6()1 operates in precise mode. Whether a floating-point result is stored and 
what value is stored is determined by the FPSCR exception enable bits, as described in 
subsequent sections, and are not affected by any MSR bit settings. 


Whenever the system floating-point enabled exception error handler is invoked, the 
microprocessor ensures that all instructions logically residing before the excepting 
instruction have completed, and no instruction after that instruction has been executed. 


If exceptions are ignored, an FPSCR instruction can be used to force any exceptions, due 
to instructions initiated before the FPSCR instruction, to be recorded in the FPSCR. A sync 
instruction can also be used to force exceptions, but is likely to degrade performance more 
than an FPSCR instruction. 
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For the best performance across the widest range of implementations, the following 
guidelines should be considered: 


If the IEEE default results are acceptable to the application, FEO and FE1 should be 
cleared (ignore exceptions mode). All FPSCR exception enable bits should be 
cleared. 


Ignore exceptions mode should not, in general, be used when any FPSCR exception 
enable bits are set. 


Precise mode may degrade performance in some implementations, perhaps 
substantially, and therefore should be used only for debugging and other specialized 
applications. 


5.4.7.2 Invalid Operation Exception Conditions 


An invalid operation exception occurs when an operand is invalid for the specified 
operation. The invalid operations are as follows: 


Any operation except load, store, move, select, or mtfsf on a signaling NaN (SNaN) 
For add or subtract operations, magnitude subtraction of infinities (ce-co) 

Division of infinity by infinity (co/co) 

Division of zero by zero (0/0) 

Multiplication of infinity by zero (c0*()) 

Ordered comparison involving a NaN (invalid compare) 


Square root or reciprocal square root of a negative, non-zero number (invalid square 
root) 


Integer convert involving a number that is too large to be represented in the format, 
an infinity, or a NaN (invalid integer convert) 


FPSCR[VXSOFT] allows software to cause an invalid operation exception for a condition 
that is not necessarily associated with the execution of a floating-point instruction. For 
example, it might be set by a program that computes a square root if the source operand is 
negative. This facilitates the emulation of PowerPC instructions not implemented in the 
MPC601. 
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5.4.7.2.1 Action for Invalid Operation Exception Conditions 


The action to be taken depends on the setting of the invalid operation exception enable bit 
of the FPSCR. When invalid operation exception is enabled (FPSCR[VE]=1) and invalid 
operation occurs or software explicitly requests the exception, the following actions are 
taken: 


¢ One or two invalid operation exceptions is set 
FPSCR[VXSNAN ](if SNaN) 
FPSCR[VXISI] (if 00-00) 
FPSCR[VXIDI] (if ¢0/c0) 
FPSCR[VXZDZ](if 0/0) 
FPSCR[VXIMZ](if co*() 
FPSCR[VX VC] (if invalid comparison) 
FPSCR[VXSOFT](if software request) 
FPSCR[VXCVI](if invalid integer convert) 


¢ If the operation is an arithmetic or convert-to-integer operation, 
the target FPR is unchanged 
FPSCR[FR FI] are cleared 
FPSCR[FPRF] is unchanged 


¢ If the operation is a compare, 
FPSCR[FR FI C] are unchanged 
FPSCR[FPCC] is set to reflect unordered 


¢ If software explicitly requests the exception, 
FPSCR[FR FI FPRF] are as set by the mtfsfi, mtfsf, or mtfsb1 instruction 


When invalid operation exception condition is disabled (FPSCRVE=0) and invalid 
operation occurs or software explicitly requests the exception, the following actions are 
taken: 


¢ One or two invalid operation exception condition bits is set 
FPSCR[VXSNAN ](if SNaN) 
FPSCR[VXISI] (if e0-02) 
FPSCR[VXIDI] (if 2°/c¢) 
FPSCR[VXZDZ] (if 0/0) 
FPSCR[VXIMZ] (if oo*()) 
FPSCR[VXVC](if invalid comparison) 
FPSCR[VXSOFT](if software request) 
FPSCR[VXCVI](if invalid integer convert) 


¢ If the operation is an arithmetic operation, the target FPR is set to a ae NaN 
FPSCR[FR FI] are cleared 
FPSCR[FPRF] is set to indicate the class of the result (quiet NaN) 


5-40 PowerPC 601 RISC Microprocessor User's Manual MOTOROLA 


¢ If the operation is a convert to 32-bit integer operation, the target FPR is set as 
follows: 
FRT[0—31] = undefined 
FRT[32-63] = most negative 32-bit integer FPSCR[FR FI] are cleared 
FPSCR[FPRF] is undefined 


¢ If the operation is a convert to 64-bit integer operation, the target FPR is set as 
follows: 
FRT[0-63] = most negative 64-bit integer 
FPSCR[FR FI] are cleared 
FPSCR[FPRF] is undetined 


¢ If the operation is a compare, 
FPSCRe_r Frc are unchanged 
FPSCR[FPCC] is set to reflect unordered 


¢ If software explicitly requests the exception, 
FPSCR[FR FI FPRF] are as set by the mtfsfi, mtfsf, or mtfsb1 instruction 


5.4.7.3 Zero Divide Exception Condition 
A zero divide exception condition occurs when a divide instruction is executed with a zero 
divisor value and a finite, non-zero dividend value. 


The name is a misnomer used for historical reasons. The proper name for this exception 
condition should be exact infinite result from finite operands exception condition 
corresponding to a mathematical pole. 


5.4.7.3.1 Action for Zero Divide Exception Condition 


The action to be taken depends on the setting of the zero divide exception condition enable 
bit of the FPSCR. When the zero divide exception condition is enabled (FPSCR[ZE]=1) 
and a zero divide condition occurs, the following actions are taken: 


¢ Zero divide exception condition bit is set 
FPSCR[ZX] = 1 


¢ The target FPR is unchanged 

¢ FPSCR[FR FI] are cleared 

¢ FPSCR[FPRF] is unchanged 
When zero divide exception condition is disabled (FPSCR[ZE]=0) and zero divide occurs, 
the following actions are taken: 


¢ Zero divide exception condition bit is set 
FPSCR[ZX] = 1 

¢ The target FPR is set to a tinfinity, where the sign is determined by the 
XOR of the signs of the operands 


¢ FPSCR[FR FI] are cleared 


¢ FPSCR[FPRF] is set to indicate the class and sign of the result (+infinity) 





MOTOROLA Chapter 5. Exceptions 5-41 


5.4.7.4 Overflow Exception Condition 

Overflow occurs when the magnitude of what would have been the rounded result if the 
exponent range were unbounded exceeds that of the largest finite number of the specified 
result precision. 


5.4.7.4.1_ Action for Overflow Exception Condition 

The action to be taken depends on the setting of the overflow exception condition enable 
bit of the FPSCR. When the overflow exception condition is enabled (FPSCR[OE]=1) and 
an exponent overflow condition occurs, the following actions are taken: 


¢ Overflow exception condition bit is set 
FPSCR[OX] = 1 
¢ For double-precision arithmetic instructions, the exponent of the normalized 
intermediate result is adjusted by subtracting 1536 


¢ For single-precision arithmetic instructions and the floating round to single- 
precision instruction, the exponent of the normalized intermediate result is adjusted 
by subtracting 192 


¢ The adjusted rounded result is placed into the target FPR 
* FPSCR[FPRF] is set to indicate the class and sign of the result (tnormal number) 


When the overflow exception condition is disabled (FPSCR[OE]=0) and an overflow 
condition occurs, the following actions are taken: 


* Overflow exception condition bit is set 
FPSCR[OX] = 1. 


¢ Inexact exception condition bit is set 
FPSCR[XX] = 1 


¢ The result is determined by the rounding mode (FPSCR[RN]) and the sign of the 
intermediate result as follows: 


— Round to nearest 
Store + infinity, where the sign is the sign of the intermediate result 


— Round toward zero 
Store the format's largest finite number with the sign of the intermediate result 


— Round toward +infinity 
For negative overflows, store the format's most negative finite number; for 
positive overflows, store +infinity 

— Round toward -infinity 
For negative overflows, store -infinity; for positive overflows, store the format's 
largest finite number 
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¢ The result is placed into the target FPR 
¢ FPSCR[FR FI] are cleared 


° FPSCR[FPRF] is set to indicate the class and sign of the result (infinity 
or tnormal number) 


5.4.7.5 Underflow Exception Condition 
The underflow exception condition is defined separately for the enabled and disabled states: 


¢ Enabled—Underflow occurs when the intermediate result is “Tiny.” 


¢ Disabled—Underflow occurs when the intermediate result is “Tiny” and there is 
“Loss of Accuracy.” 


A "Tiny" result is detected before rounding, when a non-zero result value computed as 
though the exponent range were unbounded would be less in magnitude than the smallest 
normalized number. 


If the intermediate result is “tiny” and the underflow exception condition enable bit is 
cleared (FPSCR[UE]=0), the intermediate result is denormalized (see Section 2.4.9.4, 
“Normalization and Denormalization”) and rounded (see Section 2.4.9.6, “Rounding’”’). 


“Loss of Accuracy” is detected when the delivered result value differs from what would 
have been computed were both the exponent range and precision unbounded. 


5.4.7.5.1 Action for Underflow Exception Condition 


The action to be taken depends on the setting of the underflow exception condition enable 
bit of the FPSCR. 


When the underflow exception condition is enabled (FPSCR[UE]=1) and an exponent 
underflow condition occurs, the following actions are taken: 
¢ Underflow exception condition bit is set 
FPSCR[UX] = 1 
¢ enable For double-precision arithmetic and conversion instructions, the exponent of 
the normalized intermediate result is adjusted by adding 1536 


¢ For single-precision arithmetic instructions and the floating round to single- 
precision instruction, the exponent of the normalized intermediate result is adjusted 
by adding 192 


¢ The adjusted rounded result is placed into the target FPR 


¢ FPSCR[FPRF] is set to indicate the class and sign of the result (tnormalized 
number) 


The FR and FI bits in the FPSCR allow the system floating-point enabled exception error » 


handler, when invoked because of an underflow exception condition, to simulate a trap 
disabled environment. That is, the FR and FI bits allow the system floating-point enabled 
exception error handler to unround the result, thus allowing the result to be denormalized. 
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When the underflow exception condition is disabled (FPSCR[UE]=() and an underflow 
condition occurs, the following actions are taken: 


¢ Underflow exception condition enable bit is set 
FPSCR[UX] = 1 


¢ The rounded result is placed into the target FPR 


¢ FPSCR[FPRF] is set to indicate the class and sign of the result 
(+denormalized number or +zero) 


5.4.7.6 Inexact Exception Condition 
The inexact exception condition occurs when one of two conditions occur during rounding: 


¢ The rounded result differs from the intermediate result assuming the intermediate 
result exponent range and precision to be unbounded. 


¢ The rounded result overflows and overflow exception condition is disabled. 


5.4.7.6.1 Action for Inexact Exception Condition 
The action to be taken does not depend on the setting of the inexact exception condition 


enable bit of the FPSCR. 
When the inexact exception condition occurs, the following actions are taken: 


¢ Inexact exception condition enable bit in the FPSCR is set 
FPSCR[XX] = 1 


e The rounded or overflowed result is placed into the target FPR 
¢ FPSCR[FPRF] is set to indicate the class and sign of the result 


In other PowerPC implementations, enabling inexact exception conditions may have 
greater latency than enabling other types of floating-point exception condition. 


5.4.8 Floating-Point Unavailable Exception (x’00800’) 


A floating-point unavailable exception occurs when no higher priority exception exists, an 
attempt is made to execute a floating-point instruction (including floating-point load, store, 
and move instructions), and the floating-point available bit in the MSR is disabled, 
(MSR[FP]=0). 


The register settings for floating-point unavailable exceptions are shown in Table 5-20. 
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Table 5-20. Floating-Point Unavailable Exception—Register Settings 


Setting Description 
SRRO Set to the effective address of the instruction that caused the exception. 


SRR1 0-15 Cleared 
16-31 Loaded from bits 16-31 of the MSR 







ey : 
a) : EP Value is not altered 
ME Value is not altered IT 0 






FEO 0 DT 0O 





When a floating-point unavailable exception is taken, instruction execution resumes at 
offset x’00800’ from the physical base address indicated by MSR[EP]. 


5.4.9 Decrementer Exception (x’00900’) 


A decrementer exception occurs when no higher priority exception exists, the decrementer 
register has completed decrementing, and MSR[EE]=1. The decrementer exception request 
is canceled when the exception is handled. The decrementer register counts down, causing 
an exception (unless masked) when passing through zero. The decrementer implementation 
meets the following requirements: 


¢ The operation of the RTC and the decrementer are coherent; that is, the counters are 
driven by the same fundamental time base (7.8125 MHz). 


¢ Loading a GPR from the decrementer does not affect the decrementer. 


¢ Storing a GPR value to the decrementer replaces the value in the decrementer with 
the value in the GPR. 


¢ Whenever bit () of the decrementer changes from () to 1, an exception request is 
signaled. If multiple decrementer exception requests are received before the first can 
be reported, only one exception is reported. The occurrence of adecrementer 
exception cancels the request. 


e Ifthe decrementer is altered by software and if bit 0 is changed from 0 to 1, an 
interrupt request is signaled. 


The register settings for the decrementer exception are shown in Table 5-21. 
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Table 5-21. Decrementer Exception—Register Settings 
Register Setting Description 
SRRO Set to the effective address of the instruction that the processor would have attempted to execute next 
if no exception conditions were present. 
SRR1 0-15 Cleared 
16-31 Loaded from bits 16-31 of the MSR 


EE 0 0 

PR O FE1 0 

FP1 0 Value is not altered 
ME Value is not altered 

FEO 0 





When a decrementer exception is taken, instruction execution resumes at offset x’00900’ 
from the physical base address indicated by MSR[EP]. 


5.4.10 1/O Controller Interface Error Exception (x’00A00’) 


An I/O controller interface error exception occurs when no higher-order priority exists and 
a load or store corresponding to an I/O controller interface segment generates an error. I/O 
controller interface operations are described in Section 9.6, “I/O Controller Interface 
Operation.” 


This exception is taken only when an operation to an I/O controller interface segment fails 
(such a failure is indicated to the MPC601 by a particular bus reply packet). If an I/O 
controller interface error exception occurs, the SRRO contains the address of the instruction 
following the excepting instruction. Note that illegal accesses to I/O controller interface 
space cause alignment or data access exceptions. For information refer to Section 5.4.6.1.2, 
“I/O Controller Interface Access.” Note that this exception is specific to the MPC601. The 
PowerPC architecture treats I/O controller interface exceptions as data access exceptions 
(x’00300). 


The register settings for I/O controller interface error exceptions are shown in Table 5-22. 
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Table 5-22. I/O Controller Interface Error Exception—Register Settings 


Setting Description 
SRRO Set to the effective address of the instruction following the instruction that caused the instruction. The 
addressed instruction has not been executed. SRRO contains the EA of the instruction following the 


load or store that caused the exception. 


SRR1 0-15 Cleared 
16-31 Loaded from bits 16-31 of the MSR 


SE 0 
FE1 0 
Value is not altered 





When an I/O controller interface error exception is taken, instruction execution resumes at 
offset x’O0A00’ from the physical base address indicated by MSR[EP]. 


5.4.11 System Call Exception (x’00C00’) 


A system call exception occurs when a System Call (sc) instruction is executed. The 
effective address of the instruction following the sc instruction is placed into SRRO. Bits 
16-31 of the MSR are placed into bits 16-31 of SRR1, and bits 0-15 of SRR1 are set to 
undefined values. Then a system call exception is generated. 


The system call exception causes the next instruction to be fetched from offset x’O00C00’ 
from the physical base address indicated by the new setting of MSR[IP]. This instruction is 
context synchronizing. That is, when a system call exception occurs, instruction dispatch is 
halted and the following synchronization is performed: 


1. The exception mechanism waits for all instructions in execution to complete to a 
point where they report all exceptions they will cause. 


2. The processor ensures that all instructions in execution complete in the context in 
which they began execution. 


3. Instructions dispatched after the exception is processed are fetched and executed in 
the context established by the exception mechanism. 


Register settings are shown in Table 5-23. 
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0 
0 “ ; 
ay 0 EP Value is not altered 
-, ME Value is not altered IT 0 
FEO 0O DT 0 


Table 5-23. System Call Exception—Register Settings 


Setting Description 
SRRO Set to the effective address of the instruction following the System Call instruction 


SRR1 0-15 Loaded from bits 16-31 of the instruction 
16-31 Loaded from bits 16-31 of the MSR 
















= 1 3 
ai > EP Value is not altered 
ME Value is not altered IT 0 
FEO 0O DT 0 





When a system call exception is taken, instruction execution resumes at offset x’O00COO’ 
from the physical base address indicated by MSR[EP]. 


5.4.12 Run Mode Exception (x’02000’) 


The MPC601 defines an implementation-specific exception called the run mode exception. 
This exception is taken by the MPC601 under the following circumstances: 


¢ Instruction address compare 

¢ Branch target address compare 

¢ Trace mode (MSR[SE] is set)—Note that other PowerPC processors implement a 
separate trace exception at vector x’00D00’. 


Note that this exception may not be implemented by other PowerPC processors, and that 
this exception can be enabled and disabled using bits 8 and 9 in HID1; the exception is 
enabled when HID1[8,9] = b’01’. When this exception occurs, the registers are set as 
indicated in Table 5-24. 


Table 5-24. Run Mode Exception—Register Settings 


es 


SRRO Set to the address of the instruction that causes the run mode exception 
SRR1 Loaded from bits 0-31 of the MSR. 












When arun mode exception is taken, instruction execution resumes as offset x'02000' from 
the base address indicated by MSR[EP]. 
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Chapter 6 
Memory Management Unit 


This chapter describes the MPC601’s memory management unit (MMU). The primary 
functions of the MMU are to translate logical (effective) addresses to physical addresses 
for memory accesses, I/O accesses (most I/O accesses are assumed to be memory-mapped), 
and I/O controller interface accesses, and to provide access protection on a block or page 
basis. 


There are three types of accesses generated by the MPC601 that require address translation: 
instruction accesses, data accesses to memory generated by load and store instructions, and 
I/O controller interface accesses generated by load and store instructions. 


The MPC601 MMU provides 4-Gbytes of logical address space accessible to supervisor 
and user programs with a 4-Kbyte page size and 256-Mbyte segment size. Block sizes 
range from 128 Kbyte to 8 Mbyte and are software selectable. In addition, the MPC601 
uses an interim 52-bit virtual address and hashed page tables in the generation of 32-bit 
physical addresses. 


The MMU contains three translation lookaside buffers (TLBs). There is a 256-entry, two- 
way set-associative unified (instruction and data address) TLB (UTLB) for storing 
recently-used address translations, and a four-entry fully-associative first-level instruction 
TLB (ITLB) that is used only by instruction accesses for storing recently used instruction 
address translations. Additionally, there is a four-entry block TLB (BTLB) that stores the 
available block address translations (for instruction or data addresses). BTLB entries are 
implemented as the block address translation (BAT) registers that are accessible as 
supervisor special-purpose registers (SPRs). UTLB entries are generated automatically by 
the MPC601 hardware via a search of the page tables in memory. The MPC601 maintains 
all the segment information on-chip in 16 supervisor-level segment registers. 


This chapter describes the MMU address translation mechanisms, the MMU conditions 
that cause MPC6(0)1 exceptions, the instructions used to program the MMU, and the 
corresponding registers. 


The MPC601 MMU relies on the exception processing mechanism for the implementation 
of the paged virtual memory environment and for enforcing protection of designated 
memory areas. Exception processing is described in Chapter5, “Exceptions.” 


MOTOROLA Chapter 6. Memory Management Unit 6-1 


Section 2.3.1, “Machine State Register (MSR),” describes the MSR of the MPC601, which 
controls some of the critical functionality of the MMU. 


The operation of the MPC601MMU conforms to the operating environment defined by the 
PowerPC architecture for 32-bit implementations in most respects. However, the number 
and format of the BAT registers is different, as is the available range of block sizes. In 
addition, the PowerPC architecture defines the concept of guarded memory that is not 
implemented in the MPC601. Also, some MMU instructions of the PowerPC architecture 
(including tlbsync) are not implemented in the MPC601. 


Note that the memory-forced I/O controller interface functionality described for the 
MPC601 is not defined as part of the PowerPC architecture, and will not be present in other 
PowerPC processors. Also note that the hardware implementation details of the MPC601 
MMU are not contained in the architectural definition of PowerPC processors and are 
invisible to the programming model. 


6.1 MMU Overview 


The MPC601 MMU and exception model support demand paged virtual memory. Virtual 
memory management permits execution of programs larger than the size of physical 
memory; demand paged implies that individual pages are loaded into physical memory 
from backing storage only when they are first accessed by an executing program. 


The memory management model of the MPC601 includes the concept of a virtual address 
that is not only larger than that of the maximum physical memory allowed but a virtual 
address space that is also larger than the logical address space. Each logical address 
generated by the MPC601 is 32 bits wide. In the address translation process, a logical 
address is converted to a 52-bit virtual address (as governed by the operating system) and 
then translated back to a 32-bit physical address. 


The operating system is responsible for managing the system’s physical memory resources. 
Consequently, the operating system programs the MMU registers (segment registers, BAT 
registers, and table search descriptor register 1 (SDR1)) and sets up the page tables in 
memory appropriately. The MMU then assists the operating system by managing page 
status and maintaining the recently-used address translations on-chip for quick access. 


The MPC601 logical address spaces are divided into 256-Mbyte regions called segments 
or other large regions called blocks (128 Kbyte-8 Mbyte). Segments that correspond to 
_ memory or memory-mapped devices can be further subdivided into smaller regions called 

pages (4 Kbyte). For each block or page, the operating system creates an address descriptor 
(page table entry (PTE) or BTLB entry) that the MMU uses to generate the physical address 
and the protection and other access control information when an address within the block 
or page is accessed. Address descriptors for pages reside in tables (as PTEs) in the physical 
memory; for faster accesses, the MMU maintains on-chip copies of recently used PTEs in 
the ITLB and UTLB, and keeps the block information on-chip in the BTLB (comprised of 
the BAT registers). 
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This section provides an overview of the high-level organization and operational concepts 
of the MPC601 MMU, and a summary of all MMU control registers. Section 2.3.3.5, 
“Table Search Descriptor Register 1 (SDR1),” describes the SDR1 register, Section 2.3.2, 
“Segment Registers,” describes the segment registers, and Section 2.3.1, “Machine State 
Register (MSR),” describes the MSR, and Section 2.3.3.11, “BAT Registers,” describes the 
BAT registers. 


6.1.1 Memory Addressing 


A program references memory using the effective address computed by the processor when 
it executes a load, store, branch, or cache instruction, and when it fetches the next 
instruction. The effective (logical) address is translated to a physical address according to 
the procedures described throughout this chapter. The memory subsystem uses the physical 
address for the access. 


For a complete discussion of effective address calculation, see Section 3.1.1, “Effective 
Address Calculation.” 


6.1.2 MMU Organization 


Figure 6-1 shows the conceptual organization of the MMU and its relationship to some of 
the other functional units in the MPC601. The instruction unit generates all instruction 
addresses; these addresses are both for sequential instruction prefetches and addresses that 
correspond to a change of program flow. The integer unit generates addresses for data 
accesses (both for memory and the I/O controller interface). 


After an address is generated, the upper order bits of the logical address, LAO-LA19 (or a 
smaller set of address bits, LAQ—LAn, in the cases of blocks), are translated by the MMU 
into physical address bits PAO-PA19. Simultaneously, the lower order address bits, A20— 
A31 (that are untranslated and therefore considered both logical and physical), are directed 
to the on-chip cache where they form the index into the eight-way set-associative tag array. 
After translating the address, the MMU passes the higher-order bits of the physical address 
to the cache, and the cache lookup completes. For cache-inhibited accesses or accesses that 
miss in the cache, the untranslated lower order address bits are concatenated with the 
translated higher-order address bits; the resulting 32-bit physical address is then used by the 
memory unit and the system interface, which accesses external memory. 


In addition to the upper-order address bits, the MMU automatically keeps an internal 
indicator of whether each access was generated as an instruction or data access and a 
supervisor/user indicator that reflects the state of the PR bit of the MSR when the logical 
address was generated. In addition, for data accesses, there is an indicator of whether the 
access is for a load or a store operation. This information is then used by the MMU to 
appropriately direct the address translation and to enforce the protection hierarchy 
programmed by the operating system. See Section 2.3.1, “Machine State Register (MSR),” 
for more information about the MSR. 
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Figure 6-1. MMU Block Diagram 
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For instruction accesses, the MMU first performs a lookup in the four entries of the ITLB 
for the physical address translation. Instruction accesses that miss in the ITLB and data 
accesses cause a lookup in the UTLB and BTLB for the physical address translation. In 
most cases, the physical address translation resides in one of the TLBs and the physical 
address bits are readily available to the on-chip cache. In the case where the physical 
address translation misses in the TLBs, the MPC60)1 automatically performs a search of the 
translation tables in memory using the information in the SDR1 and the corresponding 
segment register. 


6.1.3 Address Translation Mechanisms 
The MPC601 supports the following four main types of address translation: 


¢ Page address translation—translates the page frame address for a 4-Kbyte page size 


¢ Block address translation—translates the block number for blocks that range in size 
from 128 Kbyte to 8 Mbyte 


¢ [/O controller interface address translation—used to generate I/O controller 
interface accesses on the external bus 


¢ Direct address translation—when address translation is disabled, the physical 
address is identical to the logical address 


Figure 6-2 shows the four main address translation mechanisms provided by the MPC60)1. 
The segment registers shown in the figure control both the page and I/O controller interface 
address translation mechanisms. When an access uses the page or I/O controller interface 
address translation, one of the 16 on-chip segment registers is selected by the highest-order 
logical address bits. A control bit in the corresponding segment register then determines if 
the access is to memory (memory-mapped) or to the I/O controller interface space. 


For memory accesses selected by the segment register, the segment register information is 
used to generate the interim 52-bit virtual address. Page address translation corresponds to 
the conversion of this virtual address into the 32-bit physical address used by the cache or 
by external memory. In most cases, the physical address for the page resides in the UTLB 
and is available for quick access. However, if the page address translation misses in the 
UTLB, the MPC6()1 automatically searches the page tables in memory (using the virtual 
address information and a hashing function) to locate the required physical address. 


Block address translation occurs in parallel with page and I/O controller interface address 
translation and is similar to page address translation, except that there are fewer upper-order 
logical address bits to be translated into physical address bits (more lower-order address 
bits (at least 17) are untranslated to form the offset into a block). Also, instead of segment 
registers and a UTLB, block address translations use the on-chip BAT registers as a BTLB. 
If the logical address of an access matches the corresponding field of a BAT register, the 
information in the BAT register is used to generate the physical address; in this case, the 
results of the page translation (occurring in parallel) are ignored. 
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Figure 6-2. Address Translation Types 


I/O controller interface address translation is enabled when the I/O controller interface 
translation control bit (T-bit) in the selected segment register (segment register selected by 
the highest-order address bits) is set. In this case, the remaining information in the segment 
register is interpreted as identifier information that is used with the remaining logical 
address bits to generate the packets used in an I/O controller interface access on the external 
interface; additionally, no UTLB lookup or page table search is performed and the BTLB 
lookup results are ignored. For more information about the I/O controller interface 
operations, see Section 9.6, “I/O Controller Interface Operation.” 


A special case of I/O controller interface address translation (not shown in Figure 6-2) is 
supported that forces an I/O controller interface address translation to be interpreted as a 
memory access (that is, it uses the usual memory access protocol rather than the I/O 
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controller interface access protocol on the external interface). This occurs when a field in 
the selected segment register (with the T-bit set) is encoded as memory-forced I/O 
controller space. This feature effectively allows the specification of a 256-Mbyte “block” 
of memory (with a common physical block number) with the use of only one segment 
register, bypassing the page and block address translation and protection mechanisms 
described above. 


Direct address translation occurs when address translation is disabled; in this case the 
physical address generated is identical to the logical address. The translation of addresses 
for instruction and data accesses is enabled (and disabled) independently with the MSR[IT] 
and MSR[DT] bits, respectively. Thus when the instruction unit generates an instruction 
access, and instruction address translation is disabled (MSR[IT] = 0), the resulting physical 
address is identical to the logical address and all other translation mechanisms are ignored. 


When a data access occurs and MSR[DT] = 0, the resulting physical address is identical to 
the logical address with one exception—I/O controller interface address translation for data 
accesses is allowed, even when MSR[DT] = 0. In this case, the segment registers are used 
in the same way as if translation were enabled. Note that this case of data accesses to the 
1/O controller interface while MSR[DT] = 0 will not be supported in other PowerPC 
processors. 


6.1.4 Memory Protection Facilities 


In addition to the translation of logical addresses to physical addresses, the MMU provides 
access protection of supervisor areas from user access and can designate areas of memory 
as read-only. Table 6-1 shows the four protection options supported by the MPC6(01. 


Table 6-1. Access Protection Options 


awa er 


Each of these options is enforced at the block or page level. Thus, the supervisor-only 
option allows only read and write operations generated while the MPC601 is operating in 
supervisor mode (corresponding to MSR[PR] = 0) to use the selected address translation 
(block or page). User accesses that map into these blocks or pages cause an exception to be 
taken. 










As shown in the table, the supervisor-write-only option allows both user and supervisor 
accesses to read from the selected area of memory but only supervisor programs can update 
(write to) that area. There is also an option that allows both supervisor and user programs 
read and write access (both user/supervisor option), and finally, there is an option to 
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designate an area of memory as read-only, both for user and supervisor programs (both 
read-only option). 


For I/O controller interface segments, the MMU calculates a “key” bit based on the 
protection values programmed in the segment register, and the specific user/supervisor and 
read/write information for the particular access. However, this bit is merely passed on to 
the system interface to be transmitted in the context of the I/O controller interface protocol 
as described in Section 9.6, “I/O Controller Interface Operation.” The MMU does not itself 
enforce any protection or cause any exception based on the state of the key bit for these 
accesses. The I/O controller device or other external hardware can optionally use this bit to 
enforce any protection required. 


6.1.5 Page History Information 


The MPC601 MMU also maintains reference (R) and change (C) bits in the page address 
translation mechanism that can be used as history information relevant to the page. This 
information can then be used by the operating system to determine which areas of memory 
to write back to disk when new pages must be allocated in main memory. While these bits 
are initially programmed by the operating system into the page table, the MPC6()1 
automatically updates these bits when required. Note that the updates to these bits in the 
page tables are performed with standard read and write transactions on the bus (not locked 
read-modify-write operations). However, when multiple MPC601 devices have shared 
access to the page tables, the bit settings are guaranteed to be updated correctly. 


6.1.6 General Flow of MMU Address Translation 


When an instruction or data access is generated and the corresponding instruction or data 
translation is disabled (MSR[IT] = 0 or MSR[DT] = 0), direct translation is used and the 
access continues to the cache. When the selected segment register indicates that the access 
is an I/O controller interface access, I/O controller interface address translation occurs. See 
Section 6.5, “Selection of Address Translation Type” for more information regarding the 
selection of address translation mode used for all cases. 


For instruction accesses, if translation is enabled (MSR[IT] =1), the ITLB is first checked 
for a matching page or block address translation. If there is a miss, then the MMU uses the 
block and page address translation mechanisms to find the address translation. Figure 6-3 
shows the flow used to search for the block or page address translation. 


Although the MPC601 performs the block and page TLB lookups simultaneously, the flow 
diagram shows that if a BTLB hit occurs, that particular translation is performed regardless 
of the results of the UTLB lookup. If the BTLB misses, the results of the UTLB search are 
considered. If the UTLB hits, the page translation occurs and the physical address bits are 
forwarded to the cache (if the access is cacheable). 
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Figure 6-3. MMU Block and Page Address Translation Flow 






If the UTLB misses, the MPC6()1 automatically searches the page tables in memory. If the 
page table entry (PTE) is successfully read, a new UTLB entry (and an ITLB entry for the 
instruction access case) is created and the page translation is once again attempted. This 
time, the UTLB (and ITLB for instruction access case) is guaranteed to hit. If the PTE is 
not found by the table search operation, an instruction access or data access exception is 
generated. 
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Note that if either the BTLB or UTLB results in a hit, the access is qualified with the 
appropriate protection bits. If the access is determined to be protected (not allowed), an 
exception (instruction access or data access) is generated. 


6.1.7 Memory/MMU Coherency Model 
The memory model of the MPC601 provides the following features: 
¢ Performance benefits of weak ordering of memory accesses 
* Memory coherency among processors and between a processor and I/O devices 
controlled at the block and page level 
¢ Instructions that ensure a coherent and ordered memory state. 
¢ Processor address order guaranteed 


The memory implementations in MPC60)1 systems can take advantage of the performance 
benefits of weak ordering of memory accesses between processors or between processors 
and other external devices without any additional complications. The MMU assumes that 
all accesses are ordered. Thus, the priority of accesses is determined at the external 
interface in a way that provides maximum throughput for most cases. 


In addition, at the system level, the memory coherency among processors and between a 
processor and I/O devices is programmed through the following three mode control bits in 
the MMU: 


¢ Write-through (W bit) 
¢ Caching-inhibited (I bit) 
¢* Memory coherency (M bit) 


Both the block and page address translation mechanisms contain the WIM bits for each 
TLB entry; these bits are used to control all accesses that correspond to the particular block 
or page. The four possible combinations of the W and M bits yield modes that are supported 
for I = 0 (caching-allowed) as shown in Table 6-2. For the caching-inhibited (I=1) case, 
there are only two modes defined, corresponding to W=0/M = (), and W=0/M=1. 


Table 6-2. Defined WIM Combinations 
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The MPC601 also provides instructions (the cache instructions, isync, sync, eieio, lwarx, 
and stwex.) to ensure a coherent and ordered memory state. These instructions are 
described in Chapter 3, “Addressing Modes and Instruction Set Summary,” and in Memory 
accesses performed by a single processor appear to complete sequentially from the view of 
the programming model but may complete out of order with respect to the ultimate 
destination in the memory hierarchy. Order is guaranteed at each level of the memory 
hierarchy for accesses to the same address from the same processor. 


Memory coherency can be enforced externally by a snooping bus design, a centralized 
cache directory design, or other design that can take advantage of these coherency features. 


6.1.8 Effects of Instruction Prefetch on MMU 


Speculative instruction execution occurs when the MPC601 executes instructions in 
advance in case the result is needed. If subsequent events indicate that the speculative 
instruction should not have been executed, the processor abandons the results produced by 
that instruction. Typically, the MPC601 executes instructions speculatively when it has 
resources that would otherwise be idle, so the operation is done at little or no cost. 


The MPC601 executes computational instructions speculatively (beyond a branch 
instruction) and performs instruction prefetches (it fetches instructions ahead in the 
instruction stream). However, the MPC601 does not execute any load or store instructions 
speculatively. Speculative execution of computational instructions does not involve the 
MMU. 


To avoid instruction fetch delay, the processor typically prefetches instructions. Such 
instruction prefetching is speculative in that prefetched instructions may not be executed 
due to intervening branches or exceptions. 


The following constraints are enforced for instruction prefetching: 


¢ Prefetching does not occur across a page boundary (4 Kbyte). The processor only 
fetches from a new page when it is certain that at least the first instruction to be 
fetched from the new page is required for execution by the program. 


¢ Prefetching from non-cacheable (I=1) memory does not occur, except when an 
instruction within the boundaries of a cache block (sector) is needed. In that case, 
the subsequent words in the cache block (sector) are prefetched. 


¢ Neither fetching nor prefetching from I/O controller interface segments (T=1) 
occurs except for instruction fetches (or prefetches) made from memory-forced I/O 
controller interface segment. See 6.10, “I/O Controller Interface Address 
Translation” for more information about !/O controller interface segments. 


Machine check exceptions that result from instruction prefetching may be generated, even 
if the instruction fetched would not have been executed because of a previous branch or 
change in program flow. See Section 5.4.2, “Machine Check Exception (x’00200’),” for 
more information on the machine check exception. 
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Memory in the MPC601 systems is considered not “guarded” in the sense that prefetching 
may occur to any area of memory. For example, if a data area is adjacent to an instruction 
area of memory, the MPC6()1 could prefetch from that data area. Furthermore, if a word in 
that data area contains the encoding for an unconditional branch instruction, the processor 
could even continue to prefetch from the address it interprets as the target of the branch. 
Care may be required to prevent these situations, particularly if peripheral devices that 
cannot recover from extraneous accesses reside in these areas. Areas of memory in other 
PowerPC processors may be designated as guarded within the MMU in that speculative 
operations do not occur. 


6.1.9 Breakpoint Facility 


Through the use of the HIDx registers (HID1, HID2, and HID5), the MPC6()1 has the 
ability to perform a breakpoint operation for both instruction and data accesses 
independently. For instruction accesses, the logical addresses of instructions in decode are 
compared with the address specified in the instruction address breakpoint register (IABR). 
If there is a match, then the processor takes a run mode exception. Similarly, data 
breakpoints occur when the logical address of a data access matches the address specified 
in the data address breakpoint register (DABR) register. However, when a data address 
matches, the MPC601 takes a data access exception. 


The instruction and data breakpoint functionality is controlled by bit settings in the 
MPC6(0)1 debug modes register (HID1). Various combinations and levels of breakpoints can 
be enabled. Section 2.3.3.12, “MPC601 Implementation-Specific HID Registers” describes 
the breakpoint functionality provided in the MPC601. Note that these breakpoints occur 
completely independently of the MSR[DT] and MSR[IT] bit settings. 


6.1.10 MMU Exceptions Summary 


In order to complete any memory access, the logical address must be translated to a 
physical address. An MMU exception condition occurs if this translation fails for one of the 
following reasons: 


¢ There is no valid entry in the page table for the page specified by the logical address 
(and segment register). 


¢ An address translation is found but the access is not allowed by the memory 
protection mechanism. 


Most MMU exception conditions cause either the instruction access exception or the data 
access exception to be taken. The state saved by the MPC601 for each of these exceptions 
contains information that identifies the address of the failing instruction. Refer to 
Chapter 5, “Exceptions,” for a more detailed description of exception processing. 


There are 11 types of conditions that can cause an MMU exception to occur. The exception 
conditions map to the MPC601 exception as shown in Table 6-3. The only MMU exception 
condition recognized when MSR[IT]=0 is the instruction breakpoint match condition. The 
only exception conditions that occur when MSR[DT]=0 are the data breakpoint match 
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condition and the conditions that cause the alignment exception for data accesses. For more 
detailed information about the conditions that cause the alignment exception (in particular 
for string/multiple instructions) see 5.4.6, “Alignment Exception (x’00600’).” 


Table 6-3. MMU Exception Conditions/Exception Mapping 
ee ee 


Page fault No matching PTE found in page | | access: instruction access exception 
tables SRR1[1] = 1 
D access: data access exception 
DSISR{1]=1 


| access: instruction access exception 
SRAR1[4] = 1 























Conditions described in 
Table 6-7 for block 


Block protection violation 





D access: data access exception 
DSISR[4]=1 






Page protection violation Conditions described in | access: instruction access exception 
Table 6-7 for page SRR1[4] = 1 







D access: data access exception 
DSISR[4]=1 


















debz with W or | = 1 dcbz instruction to write-through 
or cache-inhibited segment or 


block 


Instruction access to I/O Attempt to fetch instruction when | Instruction access exception 
controller interface space SR[T]=1, SR[BUID] + ‘07F' Causes no SAR1 bits to be set 


lwarx, stwex., Iscbx instruction | Reservation instruction or load Data access exception 
to I/O controller interface space string and compare byte DSISR{5] = 1 

instruction when SRIT]=1, 
SR[BUID] # ‘07F' 


Floating-point load or store to I/O | FP memory access when Alignment exception 

controller interface space SR[T]=1, SR[BUID] + ‘07F’ 

Instruction breakpoint match Instruction address matches the | Run mode exception 
address in HiD2 

Data breakpoint match Data address matches the Data access exception 
address in HID5S DSISR[9] = 1 


Operand misalignment: Operand crosses a 256 Mbyte Alignment exception 
256 Mbyte boundary boundary (regardless of 
MSR[DT] and MSR[IT] setting) 


Alignment exception 












































Operand misalignment: 
4 Kbyte boundary 


Page translation (SR[T] = 0 and 
no BTLB match), and operand 
crosses a 4 Kbyte boundary 


Alignment exception 
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6.1.11 MMU Instructions and Register Summary 


Table 6-4 summarizes the instructions of the MPC601 that specifically control the MMU. 
For more detailed information about the instructions, refer to Chapter 10, “Instruction Set.” 


Table 6-4. Instruction Summary—Control MMU 
a 
mtsr SR,rS Move to Segment Register 
SR[SR#]<— rS 


mtsrin rS,rB | Move to Segment Register Indirect 
SR[rB[O-3]]}<-rS 

mfsr rD,SR Move from Segment Register 
rD<—SR[SR#] 


mfsrin rD,rB | Move from Segment Register Indirect 
rD<—-SA[rB[0-3]] 


tlbie rB Translation Lookaside Buffer Invalidate Entry 
If TLB hit (for logical address specified as rB), TLB[V]<—0 
Causes TLB! operation on the system bus. 


Table 6-5 summarizes the registers that the operating system uses to program the MMU. 
These registers are accessible to supervisor-level software only. These registers are 
described in detail in Chapter 2, “Registers and Data Types.” 


Table 6-5. MMU Registers 


Segment registers The sixteen 32-bit segment registers are present only in 32-bit implementations of 

(SRO-SR15) PowerPC. Figure 6-13 shows the format of a segment register. The fields in the 
segment register are interpreted differently depending on the value of bit 0. The 
segment registers are accessed by the MPC601-specific mtsr, mtsrin, mfsr, and 
mfsrin instructions 











BAT registers The MPC601 includes eight block-address translation registers (BATs), organized as 

(BATOU-BATS3U and four pairs (BATOU-BAT3U and BATOL—BATSL). Figure 6-6 and Figure 6-7 show the 

BATOL-BAT3L) format of the upper and lower BAT registers. These are special-purpose registers 
that are accessed by the mtspr and mfspr instructions. 


Table search descriptor The 32-bit table search descriptor register 1 (SDR1) specifies the variables used in 
register 1 accessing the page tables in memory. This is a special-purpose register that is 
(SDR1) accessed by the mtspr and mfspr instructions. 





6.1.12 TLB Entry Invalidation 


The UTLB (and ITLB) maintains on-chip copies of the PTEs that are resident in physical 
memory. The MPC601 has the ability to invalidate resident UTLB entries through the use 
of the tlbie instruction. Additionally, the tlbie instruction also causes a TLB invalidate 
broadcast (an address-only operation) to occur on the system bus so that other processors 
also invalidate their resident copies of the matching PTE. See Chapter 10, “Instruction Set” 
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for detailed information about the tlbie instruction and Section 9.3.2.2.1, “Transfer Type 
(TTO-TT4) Signals” for more information on address-only bus transactions. 


The snooping hardware of the MPC6()1 detects when other processors perform a TLB 
invalidate broadcast on the bus. In the case of a hit with an on-chip UTLB entry, the 
MPC6(0)1 performs the following: 


1. Prevents execution of any new load, store, cache control or tlbie instructions and 
prevents any new reference or change bit updates 


2. Waits for completion of any outstanding memory operations (including updates to 
the reference and change bits associated with the entry to be invalidated) 


3. Invalidates the two entries (both associativity classes) in the UTLB indexed by the 
matching address 


4. Resumes normal execution 


6.2 ITLB Description 


The MPC601 implements a four-entry, fully-associative TLB for storing the most recently 
used instruction address translations. The MPC601 automatically generates an entry in the 
ITLB whenever the page or block address translation mechanism generates a new logical- 
to-physical mapping for a page or block used for instruction fetch. Each ITLB entry can 
contain the translation information for either an entire block or a page. The MPC60)1 uses 
the ITLB for address translation of instruction accesses when MSR[IT] = 1. 


The instruction unit accesses the ITLB independently of the rest of the MMU. Therefore, 
when instruction accesses hit in the ITLB, the page and block translation mechanisms are 
available for use by data accesses simultaneously. 


The MPC6()1 also automatically maintains the integrity of the entries in the ITLB by 
purging the contents when any of the following conditions occur: 

* An mtsr or mtsrin instruction is executed 

¢ An mtspr instruction that modifies any of the BAT registers is executed 

¢ A tlbie instruction is executed 

¢ ATLB invalidate operation is detected on the system interface (via snooping) 


Since these conditions potentially cause the MMU context to be changed, the ITLB entries 
may not longer be valid. Therefore, the MMU automatically detects these conditions and 
Clears all the valid bits in the ITLB array. 


Finally, the MPC601 replaces ITLB entries on a least-recently-used (LRU) basis. 
Throughout the remainder of this chapter, the page and block translations that are resident 
in the ITLB are described within the context of page address translation and block address 
translation, as the contents of the ITLB are always a subset of translations that were 
generated for the UTLB and/or the BTLB. 
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Accesses to the ITLB are transparent to the executing program, except that hits in the ITLB 
contribute to a higher overall instruction throughput by allowing data translations to occur 
in parallel. 


6.3 Memory/Cache Access Modes 


All instruction and data accesses are performed under the control of the three mode control 
bits that are defined by the MMU for each access. The three mode control bits, W, I, and M, 
have the following effects. The W and I bits control how the processor performing the 
access uses its own cache. The M bit specifies whether the processor performing the access 
must use the memory coherency protocol to ensure that all copies of the addressed memory 
location are consistent. 


When an access requires coherency, the processor performing the access must inform the 
coherency mechanisms throughout the system that the access requires memory coherency. 
The M bit determines the kind of access performed on the bus (global or local). Note that 
these mode-control bits are relevant only when an address is translated and are not saved 
along with data in the on-chip cache (for cacheable accesses). Once an access has been 
translated, the MESI bits in the cache then control the coherency to that cache location 
made by subsequent accesses from other processors. See Chapter 4, “Cache and Memory 
Unit Operation,” for more information about cache accesses. 


The operating system programs the WIM bits for each page or block as required. The WIM 
bits reside in the BAT registers for block address translation and in the PTEs for page 
address translation. Thus these bits are programmed as follows: 


¢ The operating system uses the mtspr instruction to program the WIM bits in the 
BAT registers for block address translation. 


¢ The operating system programs the WIM bits for each page into the PTEs in system 
memory as it sets up the page tables. 


Note that for accesses performed with direct address translation (MSR[IT]=0 or 
MSR[DT]= 0 for instruction or data access, respectively), the WIM bits are automatically 
generated as b’(()1’ (the data is write-back, caching is enabled, and memory coherency is 
enforced). 


6.3.1 Write-Through Bit (W) 


When an access is designated as write-through (W=1), if the data is in the cache, a store 
Operation updates the cached copy of the data. In addition, the update is written to the 
external memory location (as described below). Store-combining compiler optimizations 
are allowed for write-through accesses except when the store instructions are separated by 
a sync instruction. Note that a store operation that uses the write-through mode may cause 
any part of valid data in the cache to be written back to main memory. 
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The definition of the external memory location to be written to in addition to the on-chip 
cache depends on the implementation of the memory system but can be illustrated by the 
following examples: 


* RAM—The store must be sent to the RAM controller to be written into the target 
RAM. 


¢« I/O device—tThe store must be sent to the I/O control hardware to be written to the 
target register or memory location. 


In systems with multilevel caching, the store must be written to at least a depth in the 
memory hierarchy that is seen by all processors and devices. 


Accesses that correspond to W=0 are considered write-back. For this case, although the 
store operation is performed to the cache, it is only made to external memory when a copy- 
back operation is required. Use of the write-back mode (W=()) can improve overall 
performance for areas of the memory space that are seldom referenced by other masters in 
the system. See Chapter 4, “Cache and Memory Unit Operation,” for more information 
about cache accesses. 


6.3.2 Caching Inhibited Bit (I) 


If I=1, the memory access is completed by referencing the location in main memory, 
completely bypassing the on-chip cache of the MPC6()1. During the access, the accessed 
location is not loaded into the cache nor is the location allocated in the cache. If a copy of 
the accessed data is in the cache, that copy is not updated, flushed, or invalidated. Data 
accesses from more than one instruction may not be combined (as a compiler optimization) 
for cache-inhibited operations. 


6.3.3 Memory Coherence Bit (M) 


This mode control bit is provided to allow improved performance in systems where 
hardware-enforced coherency is relatively slow, and software is able to enforce the required 
coherency. When M=0, the MPC6()1 does not enforce data coherency. When M=1, the 
processor enforces data coherency and the corresponding access is considered to be a 
global access. When the M bit is set, and the access is performed to external memory, the 
GBL signal is asserted and the access is designated as global. Other processors affected by 
the access must then respond to this global access and signal whether it is shared. If the data 
in another processor is modified, then address retry is signaled. 
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6.3.4 W, I, and M Bit Combinations 
Table 6-6 summarizes the six combinations of the WIM bits defined for the MPC601. 


Table 6-6. Combinations of W, |, and M Bits 


Data may be cached. 
Loads or stores whose target hits in the cache use that entry in the cache. 
Exclusive ownership of the block containing the target location is not required for store accesses 


and coherency operations for the block do not occur when fetching the block, storing it back, or 
changing its state from shared to exclusive. 


Data may be cached, 

Loads or stores whose target hits in the cache use that entry in the cache. 

Memory coherency is enforced by hardware as follows: exclusive ownership of the block 
containing the target location is required before store accesses are allowed. When fetching the 
block, the processor indicates on the bus transaction that coherency is to be enforced. If the state 
of the block is shared-unmodified, the processor must gain exclusive use of the block before 
storing into it. 


This encoding is used for addresses translated via direct address translation (MSR[IT] = 0 or 
MSR[DT] = 0). 


Caching is inhibited. 
The access is performed to external memory, completely bypassing the cache. 


Hardware enforced memory coherency is not required. 


Caching is inhibited. 
The access is performed to external memory, completely bypassing the cache. 
Memory coherency must be enforced by external hardware (MPC601 asserts GBT). 


Data may be cached. 
Loads whose target hits in the cache use that entry in the cache. 


Stores are written to external memory. The target location of the store may be cached and is 
updated on a hit. 


Exclusive ownership of the block containing the target location is not required for store accesses 
and coherency operations for the block do not occur when fetching the block, storing it back, or 
changing its state from shared to exclusive. 


Data may be cached. 
Loads whose target hits in the cache use that entry in the cache. 


Stores are written to external memory. The target location of the store may be cached and is 
updated on a hit. 


Memory coherency is enforced by hardware as follows: exclusive ownership of the block 
containing the target location is required before store accesses are allowed. When fetching the 
block, the processor indicates on the bus transaction that coherency is to be enforced. If the state 
of the block is shared, the processor must gain exclusive use of the block before storing into it. 





If the system software maps the same physical page with multiple page table entries that 
have different W, I, or M values, the results of the translation are undefined. 
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6.4 General Memory Protection Mechanism 


Another aspect of the MMU that is programmed at the block and page level is the memory 
protection option. The memory protection mechanism allows selectively granting read 
access, granting read/write access, and prohibiting access to areas of memory based on a 
number of control criteria. 


The memory protection mechanism is used by both the block and page address translation 
mechanisms in a similar way, as described here. For specific information unique to block 
address translation, refer to Section 6.7.4, “Block Memory Protection.” For specific 
information unique to page address translation, refer to Section 6.8.5, “Page Memory 
Protection.” 


For both block and page address translation, the memory protection mechanism is 
controlled by the following: 
¢ MSR[PR], which defines the mode of the access as follows: 
— MSR[PR]=0 corresponds to supervisor mode 
— MSR[PR]=I corresponds to user mode 
* Ks and Ku, the supervisor and user key bits, which define the key for the block or 
page 
¢ The PP bits, which define the access options for the block or page 


The key bits (Ks and Ku) and the PP bits are located as follows for block and page address 
translation: 


¢ Ksand Ku are located in the upper BAT register for block address translation and in 
the selected segment register for a page address translation. 


¢ The PP bits are located in the upper BAT register for block address translation and 
in the PTE for page address translation. 


The key bits, the PP bits, and the MSR[PR] bit are used as follows: 


¢ When an access is generated, one of the key bits (Ks or Ku) is selected to be the key 
as follows: 
— For supervisor accesses (MSR[PR]=0), the Ks bit is used and Ku is ignored 
— For user accesses (MSR[PR]=1), the Ku bit is used and Ks is ignored 


¢ The selected key is used with the PP bits to determine if the load or store access is 
allowed. 


Table 6-7 shows the types of accesses that are allowed for the general case (all possible Ks, 
Ku, and PP bil combinations). 
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Table 6-7. Access Protection Control with Key 


Block Type 


Read/write 
Read/write 
Read/write 
Read only 
No access 
Read only 
Read/write 


Read only 





' Ks or Ku selected by state of MSR[PR] 
2 PP protection option bits in BTLB entry 
or PTE 


Thus, the conditions that cause a protection violation are depicted in Table 6-8. Any access 
attempted (read or write) when the key = 1 and PP = 00, results in a protection violation 
exception condition. When key = 1 and PP=()1, an attempt to perform a write access causes 
a protection violation exception condition. When PP = 10, all accesses are allowed, and 
when PP = 11, write accesses always cause an exception. The MPC6()1 takes either the 
instruction access exception or the data access exception (for an instruction or data access, 
respectively) when there is an attempt to violate the memory protection. 


Table 6-8 . Exception Conditions for Key and PP Combinations 


[> [os 
oe 
se 
ce 
eae 


Although any combination of the Ks, Ku and PP bits is allowed, the Ks and Ku bits can be 
programmed so that the value of the key bit for Table 6-7 directly matches the MSR[PR] 
bit for the access. In this case, the encoding of Ks=0 and Ku=1 is used for the BTLB entry 
or the PTE, and the PP bits then enforce the protection options shown in Table 6-9. 
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Table 6-9. Access Protection Encoding of PP Bits 


Pp 
Field 


[a [ Ranson | Raatoned [Reais [| 


However, if the setting Ks=1 is used, supervisor accesses are treated as user reads and 
writes with respect to Table 6-9. Likewise, if the setting Ku=() is used, user accesses to the 
block or page are treated as supervisor accesses in relation to Table 6-9. Therefore, by 
modifying one of the key bits (in either the BAT register or the segment register), the way 
the MPC6()1 interprets accesses (supervisor or user) in a particular block or segment can 
easily be changed. Note, however, that only supervisor programs can modify the key bits 
for the block or the segment as access to the BAT registers and the segment registers is 
privileged. 

















Supervisor | Supervisor 
Read Write 
(Key=0) (Key=0) 












User Write 
(Key=1) 


User Read 
(Key=1) 






When the memory protection mechanism prohibits a reference, one of the following occurs, 
depending on the type of access that was attempted: 


* For data accesses, a data access exception is generated and bit 4 of DSISR is set. If 
the access is a store, bit 6 of DSISR is also set. 

¢ For instruction accesses, an instruction access exception is generated and bit 4 of 
SRR1 is set. 


See Chapter 5, “Exceptions,” for more information about these exceptions. 


6.5 Selection of Address Translation Type 


A description of the selection flow for determining the type of address translation to be 
performed is provided in Figure 6-4. The selection of address translation type differs for 
instruction and data accesses in that I/O controller interface accesses are not allowed for 
instruction accesses when instruction address translation is disabled, and I/O controller 
interface accesses for data occur without regard for the enabling of data address translation. 


6.5.1 Address Translation Selection for Instruction Accesses 


Addresses for instruction accesses are translated under control of the IT bit of MSR. When 
any context-synchronizing event occurs within the MPC601, any prefetched instructions 
are discarded and refetched using the updated state of MSR[IT]. 
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Logical Address 
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Data Access 


Select Segment 
Register 
O 


otherwise (T=0) VO Cont. I/F 
Address (T=1) 


Translation Enabled Translation Disabled Perform 1/O Cont. 
(MSR[DT]=1) (MSR[DT}=0) V/F Translation 
Not I/O Cont. I/F 
Perform Direct 
Translation 


Instruction Access 





Translation Enabled Translation Disabled 
(MSRI[IT}=1) (MSRAI[IT]=0) 


Perform Lookup Perform Direct 
in ITLB Translation 


Hit Miss 





















Extract Physical 
Address from ITLB 


Select Segment 
Register 


‘ 








Continue Access 


to Cache Address (T=0) 





YO Cont. I/F 
Address (T=1) 





Perform I/O Cont. 
VF Translation 


Perform Block or Page 
Address Translation 
Figure 6-4. Address Translation Type Selection 


6.5.1.1 Instruction Address Translation Disabled: MSR[IT]=0 


When instruction address translation is disabled, designated by MSR[IT]=0, the logical 
address is interpreted as described in Section 6.6, “Direct Address Translation." 


6.5.1.2 Instruction Address Translation Enabled: MSR[IT]=1 


When instruction address translation is enabled (MSR[IT] = 1), instruction fetching occurs 
under control of one of the following address translation mechanisms: 


¢ Page address translation 
* Block address translation 


Note that for either of these translation mechanisms, the ITLB is first checked for the 
address translation. If the ITLB misses, then the corresponding segment register is accessed 
to see if the access is to the I/O controller interface space. If the access is not to the I/O 
controller interface space, the page and block address translation mechanisms are used as 
shown in Figure 6-3. 
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In most cases, instructions cannot be fetched from the I/O controller interface segments and 
attempting to execute an instruction fetched from an I/O controller interface segment 
Causes an instruction access exception. However, instruction fetches are allowed when the 
address translation maps to segments with the T bit set (I/O controller interface segment) 
and with the memory-forced I/O controller interface encoding. This case is described in 
more detail in Section 6.10.4, “Memory-Forced I/O Controller Interface Accesses.” 


6.5.2 Address Translation Selection for Data Accesses 


As shown in Figure 6-4, for data accesses, the corresponding segment register is selected 
independent of the DT bit of MSR. Addresses for data accesses are translated first under 
control of the T bit of the selected segment register. If T=1, the translation is to an I/O 
controller interface segment. Otherwise, the translation is governed by the state of the DT 
bit of MSR. When the state of MSR[DT] changes, subsequent accesses are made using the 
new state of MSR[DT]. 


6.5.2.1 1/O Controller Interface Address Translation: T=1 in Segment 
Register 

I/O controller interface segments are used independently of MSR[DT]. When the segment 

register indexed by the upper-order logical address bits has the T bit set, the access is 

considered an I/O controller interface access and the I/O controller interface protocol of the 

external interface is used to perform the access to I/O controller space. 


Note, however, that an x’07F’ encoding in the BUID field of the segment register defines 
an access as a memory-forced I/O controller interface access. In this case, the memory 
protocol is used on the external interface. See Section 6.10, “I/O Controller Interface 
Address Translation” for more information on address translation for I/O controller 
interface accesses. 


6.5.2.2 Data Translation Disabled: MSR[DT]=0 

When MSR[DT]=0, the logical address is interpreted as described in Section 6.6, “Direct 
Address Translation.” Note that as shown in Figure 6-4, the determination of whether the 
address maps to an I/O controller interface segment occurs prior to the checking of 
MSR[DT]. Therefore, I/O controller interface address translation occurs independently of 
MSR[DT] for data accesses. The attempted execution of the eciwx or ecowx instructions 
while MSR[DT]=0 causes boundedly undefined results. 


6.5.2.3 Data Translation Enabled: MSR[DT]=1 
When data address translation is enabled (MSR[DT] = 1), data accesses employ one of the 
following translation mechanisms: 

¢ Page address translation 

¢ Block address translation 


The block and page address translation mechanisms locate the physical address for the 
access as described in Figure 6-3. 
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6.6 Direct Address Translation 


If address translation is disabled (MSR[IT] = 0 or MSR[DT] = 0) for a particular access 
(fetch, load, or store), the logical address is treated as the physical address and is passed 
directly to the memory subsystem as a direct address translation. 


The addresses for accesses that occur in direct translation mode bypass all memory 
protection checks as described in Section 6.4, “General Memory Protection Mechanism,” 
and do not cause the recording of reference and change information (described in 
Section 6.8.4, “Page History Recording”). Such accesses are performed as though the 
memory access mode bits (““WIM”’) were 001. That is, the cache is write-back and system 
memory does not need to be updated (W = (0), caching is enabled (I = 0), and data coherency 
is enforced with memory, I/O, and other processors (caches) (M=1 so data is global). 


Whenever an exception occurs, the MPC601 clears both the MSR[IT] and MSR[DT] bits. 
Therefore, at least at the beginning of all exception handlers (including reset), the MPC601 
operates in direct address translation mode for instruction accesses (and data accesses that 
do not map to I/O controller interface space). If address translation is required for the 
exception handler code, the software must explicitly enable address translation by 
accessing the MSR as described in Chapter 2, “Registers and Data Types.” 


Note that when translation is disabled, I/O controller interface segments can still be used 
for data accesses as the T bit of the segment registers is checked and segment registers with 
T=1 are used independently of MSR[DT]. 


Note also that an attempt to fetch from, load from, or store to a physical address that is not 
physically present in the system may cause a machine check exception (or even a checkstop 
condition), depending on the response by the system for this case. See Section 5.4.2, 
“Machine Check Exception (x’00200’),” for more information on machine check 
exceptions. 


6.7 Block Address Translation 


The block address translation (BAT) mechanism in the MPC601 provides a way to map 
ranges of logical addresses larger than a single page into contiguous areas of physical 
memory. Such areas can be used for data that is not subject to normal virtual memory 
handling (paging), such as a memory-mapped display buffer or an extremely large array of 
numerical data. 


The implementation of the block address translation in the MPC6()1 including the block 
protection mechanism is described followed by a block translation summary with a detailed 
flow diagram. 


6.7.1 BTLB Organization 


The block translation lookaside buffer (BTLB) of the MPC6()1 maintains the address 
translation information for four blocks of memory. The BTLB in the MPC601 is maintained 
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by the system software and is implemented as a set of eight special-purpose registers 
(SPRs). Each block is defined by a pair of SPRs called upper and lower BAT registers. 


The BAT registers can be read from or written to by the mfspr and mtspr instructions; 
access to the BAT registers is privileged. Section 6.7.3, “BAT Register Implementation of 
BTLB,” gives more information about the BAT registers. Note that the BTLB entries are 
completely ignored for TLB invalidate operations detected on the system bus and in the 
execution of the tlbie instruction. 


Figure 6-5 shows the organization of the BTLB. Four pairs of BAT registers are provided 
for translating instruction and data addresses. These four pairs of BAT registers comprise 
the four-entry fully-associative BTLB (each BTLB entry corresponds to a pair of BAT 
registers). The BTLB is fully-associative in that all four entries are compared with the 
logical address of the access to check for a match simultaneously. 


Unmasked bits of LAO-LA14 
(Instruction or Data Access) 


[Compare } 


[Compare } 





BTLB Hit/Miss 


Figure 6-5. BTLB Organization 


Each pair of BAT registers defines the starting address of a block in the logical address 
space, the size of the block, and the start of the corresponding block in physical address 
space. If a logical address is within the range defined by a pair of BAT registers, its physical 
address is defined as the starting physical address of the block plus the lower order logical 
address bits. 


Blocks are restricted to a finite set of sizes, from 128 Kbytes (2!7 bytes) to 8 Mbytes (277 
bytes). The starting address of a block in both logical address space and physical address 
space is defined as a multiple of the block size. 


Because the BTLD cntrics are used for both instruction and data access, if the same memory 
address is to be mapped for both instruction fetching and data load and store operations, the 
address mapping must only be loaded into one register pair. 


It is an error for system software to program the BAT registers such that a logical address 
is translated by more than one BAT pair. If this occurs, the results are undefined and may 





MOTOROLA Chapter 6. Memory Management Unit 6-25 


include a spurious violation of the memory protection mechanism, a machine check 
exception, or a check stop condition. 


6.7.2 Recognition of Addresses in BTLB 


The BTLB (BAT registers) is accessed in parallel with segmented address translation to 
determine whether a particular logical address corresponds to a block defined by the BTLB. 
If a logical address is within a valid BAT area, the physical address for the memory access 
is determined, as described in Section 6.7.5, “Block Physical Address Generation.” 


Block address translation is enabled only when address translation is enabled (MSR[IT]=1 
and/or MSR[DT]=1) and only when the indexed segment register specifies T=0. That is, the 
BAT does not apply to I/O controller interface segments (T=1). When the segment register 
has T=1, the segment register translation is used. (This is true for both I/O controller 
interface segments and memory-forced I/O controller interface segments.) 


The BAT registers and the segmented address translation mechanism can be programmed 
such that a particular logical address is within a BAT area and that logical address also has 
a segment register translation that corresponds to page address translation (T=0 in the 
segment register). When this occurs, the block address translation is used as shown in 
Table 6-10 and the segment address translation is ignored. 


Table 6-10. Address Translation Precedence for Blocks and Segments 


il a Address Translation 


Matching BTLB entry prevails 
Segment register prevails 


Additionally, a block can be defined to overlay part of a segment such that the block portion 
is non-paged although the rest of the segment is pageable. This allows non-paged areas to 
be specified within a segment, and PTEs for the part of the segment overlaid by the block 
are not required. 





6.7.3 BAT Register Implementation of BTLB 


Recall that the BTLB is comprised of four entries used for instruction accesses and data 
accesses. Each BTLB entry consists of a pair of BAT registers—an upper and a lower BAT 
register for each entry. The BAT registers are accessed with the mtspr and mfspr 
instructions and are only accessible to supervisor-level programs. See Section 3.7, 
“Processor Control Instructions,” for a list of simplified mnemonics for use with the BAT 
registers. 


Figure 6-6 shows the format of the upper BAT registers and Figure 6-7 shows the format 
of the lower BAT registers. The format of the upper and lower BAT registers in the 
MPC601 differs from that of the BAT registers in other PowerPC processors. 
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Reserved 


Figure 6-6. Format of Upper BAT Registers 


Reserved 





0 14 15 24 25 26 31 
Figure 6-7. Format of Lower BAT Registers 


The BAT registers contain the logical to physical address mappings for blocks of memory. 
This mapping information includes the logical address bits that are compared with the 
logical address of the access, the memory/cache access mode bits (WIM) and the protection 
bits for the block. In addition, the size of the block and the starting address of the block are 
defined by the block page number and block size mask fields. 


Table 6-11 describes the bits in the upper and lower BAT registers. 
Table 6-11. BAT Registers—Field and Bit Descriptions 


Presse [ee [me [i 


Upper 0-14 BLPI Block logical page index. This field is compared with bits 0-14 of the logical 
BAT address to determine if there is a hit in that BTLB entry. 
Registers 


Memory/cache access mode bits 

W_ Write-through 

! = Caching-inhibited 

M Memory coherence 

For detailed information about the WIM bits, see Section 6.3, 
“Memory/Cache Access Modes.” 


Supervisor mode key. This bit interacts with MSR[PR] and the PP field to 
determine the protection for the block. For more information, see Section 6.4, 
“General Memory Protection Mechanism." 


User mode key. This bit also interacts with MSR[PR] and the PP field to 
determine the protection for the block. For more information, see Section 6.4, 
“General Memory Protection Mechanism.” 


Protection bits fer block. This field interacts with MSR[PR] and the Ks or Ku 
to determine the protection for the block as described in Section 6.4, 
“General Memory Protection Mechanism.” 
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Table 6-11. BAT Registers—Field and Bit Descriptions (Continued) 


Cevsee[ ow [me [em 


Lower 0-14 Physical block number. This field is used in conjunction with the BSM field to 
BAT generate bits 0-14 of the physical address of the block. 


Registers 
Eee ee BAT register pair (BTLB entry) is valid if V=1 


Block size mask (0...5). BSM is a mask that encodes the size of the block. 
Values for this field are listed in Table 6-12. 






















The BSM field in the lower BAT register is a mask that encodes the size of the block. 
Table 6-12 defines the bit encodings for the BSM field of the lower BAT register. Note that 
the range of block sizes is a subset of that defined by the PowerPC architecture. 


Table 6-12. Lower BAT Register Block Size Mask Encodings 


Only the values shown in Table 6-12 are valid for BSM. A logical address is determined to 
be within a BAT area if the appropriate bits (determined by the BSM field) of the logical 
address matches the value in the BLPI field of the upper BAT register, and if the valid bit 
(V) of the corresponding lower BAT register is set. 












The boundary between the strings of zeros and ones in the BSM field determines the bits 
of the logical address that are used in the comparison with the BLPI field to determine if 
there is a hit in that BTLB entry. The rightmost bit of the BSM field is aligned with bit 14 
of the logical address; bits of the logical address corresponding to ones in the BSM field are 
then forced to zero for the comparison. 


The value loaded into the BSM field determines both the length of the block and the 
alignment of the block in both logical address space and physical address space. The values 
loaded into the BLPI and PBN fields must have at least as many low-order zeros as there 
are ones in BSM. 
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6.7.4 Block Memory Protection 


If a logical address is determined to be within a block defined by the BTLB, the access is 
next validated by the memory protection mechanism. If this protection mechanism 
prohibits the access, a block protection violation exception condition (data access 
exception or instruction access exception) is generated. 


The block protection mechanism provides protection at the granularity defined by the block 
size (128 Kbyte to 8 Mbyte) and is described in Section 6.4, “General Memory Protection 
Mechanism.” 


The Ks, Ku, and PP bits are located in the upper BAT register for block address translation. 


When the block protection mechanism prohibits a reference, one of the following occurs, 
depending on the type of access that was attempted: 


* For data accesses, a data access exception is generated and bit 4 of DSISR is set. If 
the access was a store, bit 6 of DSISR is additionally set. 


¢ For instruction accesses, an instruction access exception is generated and bit 4 of 
SRR1 is set. 


6.7.5 Block Physical Address Generation 


If the block protection mechanism validates the access, a physical address is formed as 
shown in Figure 6-8. Bits in the logical address corresponding to ones in the BSM field, 
concatenated with the 17 lower-order bits of the logical address form the offset within the 
block of memory in the case of a hit. Bits in the logical address corresponding to zeros in 
the BSM field are then logically ORed with the corresponding bits in the PBN field to form 
the next higher-order bits of the physical address. Finally, the highest-order nine bits of the 
PBN field form bits 0-8 of the physical address (PAOQ-PA8). 


Access to the physical memory within the block is made according to the memory/cache 
access mode defined by the WIM bits in the upper BAT register. These bits apply to the 
entire block rather than to an individual page and are described in Section 6.3, 
“Memory/Cache Access Modes.” 
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Logical Address 


Block Size 


Physical Block Number 


Physical Address 





Figure 6-8. Block Physical Address Generation 


6.7.6 Block Address Translation Summary 


Figure 6-9 provides the detailed flow for the block address translation mechanism. 
Figure 6-9 is an expansion of the “BTLB Hit” branch of Figure 6-3. Note that if the dcbz 
instruction is attempted to be executed with either W=1 or I =1, the alignment exception is 
generated. 
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BTLB Hit 













BLPI (0-8) = LAO-LA8, and 
BLPI (9-14) = (LA9-LA14) & (BSM), and 
Vat 


dcbz Instruction 
with W or | = 1 


Alignment 
Exception 





Select Key: 
lf MSR[PR] = 0, Key = Ks 
\f MSR[PR] = 1, Key = Ku 


otherwise Memory Protection 


Violation 
Memory Protection 
Violation Flow 


(See Figure 6-10) 





PAO-PA31 = PBN (0-8) || 
PBN (9-14) OR ((LA9-LA14) & (BSM)) || 
LA15-LAS1 


Continue Access to Cache with 
WIM in Upper BAT Register 
Figure 6-9. Block Address Translation Flow 


Figure 6-10 further expands on the determination of a memory protection violation and the 
subsequent actions taken by the processor in this case. Note that in the case of a memory 
protection violation for the attempted execution of a dcbt of dcbtst instruction, the 
translation is aborted and the instruction executes as a no-op (no violation is reported). 


6.8 Memory Segment Model 


Memory in the MPC601 is divided into sixteen 256-Mbyte segments. The segmented 
memory model of the MPC6(0)1 provides a way to map 4-Kbyte pages of logical addresses 
to 4-Kbyte pages in physical memory (page address translation), while providing the 
programming flexibility afforded by a large virtual address (52 bits). 
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Memory Protection 
Violation 


Write Access with Read Access 

Key || one any of: with Key || PP = 100 
100 
101 
111 

















otherwise dcbt/debtst 
Instruction 
Instruction Data Abort Transtation 


Access Access 


Set SRR1[4] = 1 Set DSISR[4] = 1 











Instruction Access Store 
Exception Operation 





Load 
Operation 





Set DSISR{[6] = 1 







Data Access 
Exception 





Figure 6-10. Memory Protection Violation Flow 


The page address translation mechanism may be superseded by the block address 
translation (BAT) mechanism described in Section 6.7, “Block Address Translation.” If 
. . not, the translation proceeds in two steps: from logical address to the 52-bit virtual address 
(which never exists as a specific entity but can be considered to be the concatenation of the 
virtual page number and the byte offset within a page), and from virtual address to physical 


address. 


The implementation of the page address translation mechanism in the MPC601 is described 
followed by a summary of page address translation with a detailed flow diagram: 
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6.8.1 Page Address Translation Resources 


The page address translation performed by the MPC6()1 is facilitated by the 16 segment 
registers, which provide virtual address and protection information, and by the UTLB, 
which maintains 256 recently-used page table entries (PTEs). The segment registers are 
programmed by the operating system to provide the virtual ID for a segment. In addition, 
the operating system also creates the page tables in memory that provide the logical to 
physical address mappings (in the form of PTEs) for the pages in memory. 


As shown in Figure 6-11, when an access occurs, one of the 16 segment registers is selected 
with LAQ—-LA3. For page address translation, the virtual ID field in the segment register is 
then compared with the corresponding field of the two entries in the UTLB selected by 
LA13-LA19 (one entry corresponding to set 0) and the other to set 1). In the case of a hit, 
the result of this comparison is then used to select which physical page number (PPN) (from 
set 0 or 1) to use for the access. 


Segment Registers : 
LAO-LA31 0 78 31 






Select 





LA4-LA12 






LA13-LA19 Select 


Set 1/Set 0 Hit 


PAO-—PA19 


Figure 6-11. Segment Register and UTLB Organization 


In the case of a UTLB miss, the table search hardware in the MMU automatically searches 
for the required PTE in the page tables in memory. The MMU then automatically loads the 
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UTLB with the PTE and the address translation is performed. Note that for an instruction 
access, the required PTE is also loaded into the ITLB for future use. 


If the table search operations fail to locate the required PTE, then the appropriate exception 
(instruction access exception or data access exception) is taken. See Section 6.9.2, “Page 
Table Search Operation” for more information on the context for these exception 
conditions. 


6.8.2 Recognition of Addresses in Segments 


As described in Section 6.7.2, “Recognition of Addresses in BTLB,” the block and page 
translation mechanisms operate in parallel such that if the logical address of an access hits 
in the BTLB (the address can be translated as a block address), the selected segment register 
is ignored, unless T=1 in the segment register. 


Segments in the MPC6()1 are defined as one of the following two types: 


* Memory segment—A logical address in these segments represents a virtual address 
that is used to define the physical address of the page. 


¢ J/O controller interface segment—References made to I/O controller interface 
segments use the I/O controller interface bus protocol described in Section 9.6, “I/O 
Controller Interface Operation,” and do not use the virtual paging mechanism of the 
MPC601. See Section 6.10, “I/O Controller Interface Address Translation,” for a 
complete description of the mapping of I/O controller interface segments. 


The T bit in a segment register selects between memory segments and I/O controller 
interface segments, as shown in Table 6-13. 


Table 6-13. Segment Register Types 


Segment Register 


VO controller interface segment 


The types of address translation used by the MPC601 MMU are shown in the flow diagram 
of Figure 6-4. 










6.8.2.1 Selection of Memory Segments 

_ All accesses generated by the MPC601 index into the array of segment registers and select 
one of the 16 with LAO-—LA3. If MSR[IT]=0 or MSR[DT]=0 for an instruction or data 

access, respectively, then direct address translation is performed as described in 

Section 6.6, “Direct Address Translation.” Otherwise, if T=0 for the selected segment 

register, the access maps to memory space and page address translation is performed. 


After a memory segment is selected, the MPC601 creates the 52-bit virtual address for the 
segment and searches for the PTE (first in the UTLB, then in the page tables in memory) 
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that dictates the physical page number to be used for the access. Note that I/O devices can 
easily be mapped into memory space and used as memory-mapped I/O. 


6.8.2.2 Selection of I/O Controller Interface Segments 

All data accesses generated by the MPC601 index into the array of segment registers and 
select one of the 16 with LAQ-LA3. If T=1 for the selected segment register, the access 
maps to the I/O controller interface and the access proceeds as described in Section 6.10, 
“I/O Controller Interface Address Translation.” This is true, even if data address translation 
is disabled (MSR[DT]=0). 


For the case of instruction accesses, however, the MPC601 checks the state of the MSR[IT] 
bit before checking the T bit in the segment register. If MSR[IT] = 0, direct address 
translation is performed as described in Section 6.6, “Direct Address Translation.” If 
MSRIIT] = 1 and the T bit of the selected segment register is set, then the MMU further 
checks the state of the BUID field of the segment register. If BUID has the encoding x’07F’, 
the segment is designated as a memory-forced I/O controller interface segment and the 
instruction fetch occurs as described in Section 6.10.4, “Memory-Forced I/O Controller 
Interface Accesses.” Otherwise, an instruction access exception occurs. 


6.8.3 Page Address Translation 


The first step in page address translation is the conversion of the 32-bit logical address of 
an access into the 52-bit virtual address. The virtual address is then used to locate the PTE 
either in the UTLB or in the page tables in memory. The physical page number is then 
extracted from the PTE and used in the formation of the physical address of the access. 


Figure 6-12 shows the translation of a logical address to a physical address as follows: 


° Bits 0-3 of the logical address comprise the segment register number used to select 
a segment register, from which the virtual segment ID (VSID) is extracted. 


¢ Bits 4-19 of the logical address correspond to the page number within the segment; 
these are concatenated with the VSID from the segment register to form the virtual 
page number (VPN). The VPN is used to search for the PTE in either the UTLB or 
the page table. The PTE then provides the physical page number (PPN). 


¢ Bits 20-31 of the logical address are the byte offset within the page; these are 
concatenated with the PPN field of a PTE to form the physical address used to access 
memory. 
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0 34 19 20 31 


: ; SR# API Byte Offset 
32-Bit Logical Address (4-bit)| (6-bit) | 
Page Index (16-bit) 
Segment 
Registers 


0 23 24 39 40 51 


52-Bit Virtual Address Virtual seen (VSID) Pagebt) arose 


Virtual Page Number (VPN) 


UTLB/Page 
Table 


PTE 


' : Physical Page Number (PPN Byte Offset 
32-Bit Physical Address y @0-bit) EM) 


Figure 6-12. Page Address Translation Overview 


6.8.3.1 Segment Register Definition 

The fields in the 16 segment registers are interpreted differently depending on the value of 
bit 0 (the T bit). When T=1, the segment register defines an I/O controller interface 
segment, and the format is described in Section 6.1().1, “Segment Register Format for I/O 
Controller Interface.” Figure 6-13 shows the format of a segment register used in page 
address translation (T=0). 


Reserved 





0123 7 8 31 


Figure 6-13. Segment Register Format for Page Address Translation 


Table 6-14 provides the definitions of the segment register bits for page address translation. 
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Table 6-14. Segment Register Bit Definition for Page Address Translation 


a [mm [tein 
Ce ae 
a7 = 









User-mode memory key 
VSID Virtual segment ID 


The Ks and Ku bits partially define the access protection for the pages within the segment. 
The page protection provided in the MPC601 is described in Section 6.8.5, “Page Memory 
Protection.” The virtual segment ID field is used as the high-order bits of the virtual page 
number (VPN) as shown in Figure 6-12. 





The segment registers are programmed with MPC601-specific instructions that implicitly 
reference the segment registers. The MPC601 segment register instructions are 
summarized in Table 6-15. These instructions are privileged in that they are executable 
only while operating in supervisor mode. See Section 2.3.3.1, “Synchronization for 
Supervisor-Level SPRs, and Segment Registers” for information about the synchronization 
requirements when modifying the segment registers. See Chapter 10, “Instruction Set,” for 
more detail on the encodings of these instructions. 


Table 6-15. Segment Register Instructions 
[ese [ion 
mtsr SR#,rS Move to Segment Register 
SR[SR#]<— rS 
mtsrin rS,rB Move to Segment Register Indirect 
SR[rB[O-3]]}<rS 


mfsr rD,SR# Move from Segment Register 
rD<—SR[SR#] 

mfsrin rD,rB Move from Segment Register Indirect 
rD—SRA[rB[0-3]] 


These instructions are specific to the MPC601 and not 
guaranteed on other PowerPC processors. 





6.8.3.2 Page Table Entry (PTE) Format 


Page table entries (PTEs) are generated and placed in page tabies in memory by the 
operating system using the hashing algorithm described in Section 6.9.1.3, “Hashing 
Functions.” Each PTE is a 64-bit entity (two words) that maps one virtual page number 
(VPN) to one physical page number (PPN). Information in the PTE controls the table 
search process and provides input to the memory protection mechanism. Figure 6-14 shows 
the format of both words that comprise a PTE. 
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Reserved 


0 1 24 25 26 31 





0 1920 22232425 27 28 29 3031 
Figure 6-14. Page Table Entry Format 


Table 6-16 lists the bit definitions for each word in a PTE. 
Table 6-16. PTE Bit Definitions 


Cee [a [tm [en 
[ovat aerials a) 

[oie [Pea | Pale mber 

[ae |— [Rene 
ne ee 
fe [omen 
=a aoa 

[ee __[Peesrmcontis 





All other fields are reserved. 


The PTE contains an abbreviated page index rather than the complete page index field 
because at least ten of the low-order bits of the page index are used in the hash function to 
select a PTEG address (the location of a PTE). These bits are not repeated in the PTEs of 
that PTEG. However, when a PTE is loaded into the UTLB, the entire page index (PI) field 
must be loaded into the UTLB entry. The PI field is then compared with incoming logical 
address bits LA4-LA12 (LA13-LA16 select the UTLB entries to be compared) to 
determine if there is a hit. 


The virtual segment ID field corresponds to the high-order bits of the virtual page number 
(VPN), and, along with the H bit, it is used to locate the PTE. The R and C bits maintain 
history information for the page as described in Section 6.8.4, “Page History Recording.” 
The WIM bits define the memory/cache control mode for accesses to the page. Finally, the 
PP bits define the remaining access protection constraints for the page. The page protection 
provided in the MPC601 is described in Section 6.8.5, “Page Memory Protection.” 
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Conceptually, the page table is searched by the address translation hardware to translate the 
address of every reference. For performance reasons, the UTLB maintains recently-used 
PTEs so that the table search time is eliminated for most accesses. The UTLB is searched 
for the address translation first. If the PTE is found, then no page table search is performed. 
As a result, software that changes the page tables in any way must perform the appropriate 
TLB invalidate operations to keep the UTLB (and ITLB) coherent with respect to the page 
tables. 


6.8.4 Page History Recording 


Reference (R) and change (C) bits are automatically maintained by the MPC601 in the PTE 
for each physical page (for accesses made with page table address translation) to keep 
history information about the page. This information can then be used by the operating 
system to determine which areas of memory to write back to disk when new pages must be 
allocated in main memory. Reference and change recording is not performed for 
translations made with the BAT or for accesses that correspond to I/O controller interface 
(T=1) segments. Furthermore, R and C bits are maintained only for accesses made while 
address translation is enabled (MSR[IT]=1 or MSR[DT]=1). . 


The reference and change bits are automatically updated by the MPC601 under the 
following circumstances: 
¢ For UTLB hits, if the C bit requires updating (as shown in Table 6-16). 
¢ For UTLB misses, when a table search is in progress to locate a PTE. The R and C 
bits are updated (set, if required) to reflect the status of the page based on this access. 


Table 6-17. Table Search Operations to Update History Bits—UTLB Hit Case 


es 
Write: Table search operation to update C 


Note that the processor updates the C bit based only on the status of the C bit in the UTLB 
entry in the case of a UTLB hit (the R bit is assumed to be set in the page tables if there is 
a UTLB hit). Therefore, when software clears the R and C bits in the page tables in memory, 
it must invalidate the UTLB entries associated with the pages whose reference and change 
bits were cleared. See Section 6.9.3, “Page Table Updates,” for all of the constraints 
imposed on the software when updating the reference and change bits in the page tables. 






The R bit or the C bit for a page is not set by the execution of the Data Cache Block Touch 
instructions (debt, or debtst). 
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6.8.4.1 Reference Bit 


The reference bit of a page is located both in the PTE in the page table and in the copy of 
the PTE loaded into the UTLB. Every time a page is referenced (with a read or write access) 
the reference bit is set in the page table by the MPC601. Because the reference to a page is 
what causes a PTE to be loaded into the UTLB, the reference bit in all UTLB entries is 
always set. The processor never automatically clears the reference bit. 


The reference bit is only a hint to the operating system about the activity of a page. At times, 
the reference bit may be set although the access was not logically required by the program 
or even if the access was prevented by memory protection. Examples of this include the 
following: 


¢ Prefetching of instructions not subsequently executed 
* Accesses that cause exceptions and are not completed 


6.8.4.2 Change Bit 

The change bit of a page is also located both in the PTE in the page table and in the copy 
of the PTE loaded into the UTLB. Whenever a data store instruction is executed 
successfully, if the UTLB search (for page address translation) results in a hit, the change 
bit in the matching UTLB entry is checked. If it is already set, the processor does not 
change the C bit. If the UTLB change bit is 0, it is set and a table search operation is 
performed to set the C bit also in the corresponding PTE in the page table. 


The change bit (in both the UTLB and the PTE in the page tables) is set only when a store 
operation is allowed by the page memory protection mechanism. 


The automatic update of the reference and change bits in the MPC6()1 is performed with 
single-beat read and write transactions on the bus (not with atomic read/modify/write 
operations). 


During a table search operation, PTEs are fetched as global, nonexclusive read transactions 
(not as read-with-intent-to-modify transactions). This reduces cache thrashing in other 
processors (in a multiprocessor system) caused by UTLB load operations because other 
processors do not have to invalidate their resident copies of the PTEs. The response on the 
bus to a PTE load transaction should then be exclusive (SHD signal not asserted) if no other 
processor has a copy. Because PTEs are considered as cacheable, the MESI protocol of the 
cache then ensures that coherency is maintained among multiple processors for C bit 
updates to the page tables. 


6.8.5 Page Memory Protection 


Similar to the block memory protection mechanism, the page memory protection of the 
MPC601 provides selective access to each page in memory. If a logical address is 
determined to be within a page defined by the segment registers and an entry in the UTLB, 
the access is next validated by the page protection mechanism. If this protection mechanism 
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prohibits the access, a page protection violation (data access exception or instruction access 
exception) is generated. 


When the page protection mechanism prohibits a reference, one of the following occurs, 
depending on the type of access that was attempted. 


¢ For data accesses, a data access exception is generated and bit 4 of DSISR is set. If 
the access was a store, bit 6 of DSISR is additionally set. 


¢ For instruction accesses, an instruction access exception is generated and bit 4 of 
SRR1 is set. 


See Chapter 5, “Exceptions,” for more information on these types of exceptions 


A store operation that is not permitted because of the page protection mechanism does not 
cause the change (C) bit to be set in the PTE (in either the UTLB or in the page tables in 
memory); however, a prohibited store access may cause a PTE to be loaded into the UTLB 
and consequently cause the reference bit to be set in a PTE (both in the UTLB and in the 
page table in memory). 


6.8.6 Page Address Translation Summary 


Figure 6-15 provides the detailed flow for the page address translation mechanism. The 
figure is an expansion of the “UTLB Hit” branch of Figure 6-3. The detailed flow for the 
“UTLB Miss” branch of Figure 6-3 is described in Section 6.9.2, “Page Table Search 
Operation." Note that as in the case of block address translation, if the dcbz instruction is 
attempted to be executed with either W=1 or I =1, the alignment exception is generated. 
Also note that the memory protection violation flow for page address translation is identical 
to that of the block memory protection violation and is provided in Figure 6-10. 


6.9 Hashed Page Tables 


When an access that is to be translated by the page address translation mechanism results 
in a miss in the UTLB (a PTE corresponding to the VSID of the segment register is not 
resident in either of the UTLB entries indexed by LA13—LA19), the MPC601 automatically 
searches the page tables set up by the operating system in main memory. 


The algorithm used by the processor in searching the page tables includes a hashing 
function on some of the virtual address bits. Thus, the addresses for PTEs are allocated 
more evenly within the page tables and the hit rate of the page tables is maximized. This 
algorithm must be synthesized by the operating system for it to correctly place the page 
table entries in main memory. 


This section describes the format of the page tables and the algorithm used to access them. 
In addition, the constraints imposed on the software in updating the page tables (and other 
MMU resources) are described. 
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Figure 6-15. Page Address Translation Flow—UTLB Hit 


otherwise 





PAO-PA31<-PPN]|A20-A31 









Continue Access to Cache 
with WIM from UTLB entry 





6.9.1 Page Table Definition 


The hashed page table is a variable-sized data structure that defines the mapping between 
virtual page numbers and physical page numbers. The page table size is a power of 2, and 
its starting address is a multiple of its size. 
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The page table contains a number of page table entry groups (PTEGs). A PTEG contains 
eight page table entries (PTEs) of eight bytes each; therefore each PTEG is 64 bytes long. 
PTEG addresses are entry points for table search operations. Figure 6-16 shows two PTEG 
addresses (PTEGaddr1 and PTEGaddr2) where a given PTE may reside. 


Page Table 


alee] [|| 
nese [Pelee] [11 


PTEGO 





roca [rere] T_T 


Figure 6-16. Page Table Definitions 


A given PTE can reside in one of two possible PTEGS. For each PTEG address, there is a 
complementary PTEG address—one is the primary PTEG and the other is the secondary 
PTEG. Additionally, a given PTE can reside in any of the PTE locations within an 
addressed PTEG. Thus, a given PTE may reside in one of 16 possible locations within the 
page table. If a given PTE is not resident within either the primary or secondary PTEG, a 
page table miss occurs, corresponding to a page fault condition. 


A table search operation is defined as the search of a PTE within a primary and secondary 
PTEG. When a table search operation commences, a primary hashing function is performed 
on the virtual address. The output of the hashing function is then concatenated with bits 
(some of them masked) programmed into the SDR1 register by the operating system to 
create the physical address of the primary PTEG. The PTEs in the PTEG are then checked, 
one by one, to see if there is a hit within the PTEG. In case the PTE is not located during 
this PPEG, a secondary hashing function is performed, a new physical address is generated 
for the PTEG, and the PTE is searched for again, this time using the secondary PTEG 
address. 7 
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6.9.1.1 Table Search Description Register (SDR1) 


The SDR Iregister contains the control information for the table structure in that it defines 
the highest order bits for the physical base address of the page table and it defines the size 
of the table. The format of the SDR1 register is shown in Figure 6-17 and the bit settings 
are described in Table 6-18. 


Reserved 


HTABORG HTABMASK 


0 15 16 22 23 31 





Figure 6-17. SDR1 Register Format 


Table 6-18. SDR1 Register Bit Settings 


TA 


The HTABORG field in SDR1 contains the high-order 7-16 bits of the 32-bit physical 
address of the page table. Therefore, the beginning of the page table lies on a 2!¢ byte (64 
Kbyte) boundary at a minimum. 
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A page table can be any size 2" where 16 <n < 25. The HTABMASK field in SDR1 
contains a mask value that determines how many bits from the output of the hashing 
function are used as the page table index. This mask must be of the form b’00...011...1’ (a 
string of () bits followed by a string of 1 bits). As the table size increases, more bits are 
used from the output of the hashing function to index into the table. The 1 bits in 
HTABMASK determine how many additional bits (beyond the minimum of 10) from the 
hash are used as the index; the HTABORG field must have the same number of lower-order 
bits equal to 0 as the HTABMASK field has lower-order bits equal to 1. 


6.9.1.2 Page Table Size 


The number of entries in the page table directly affects performance because it influences 
the hit ratio in the page table and thus the rate of page fault exception conditions. If the table 
is too small, not all virtual pages that have physical page frames assigned may be mapped 
via the page table. This can happen if there are more than 16 entries that map to the same 
primary/secondary pair of PTEGs; in this case, many hash collisions may occur. 


The minimum allowable size for a page table is 64 Kbytes (219 PTEGs of 64 bytes each). 
However, it is recommended that the total number of PTEGs (primary plus secondary) in 
the page table be greater than half the number of physical page frames to be mapped. While 
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avoidance of hash collisions cannot be guaranteed for any size page table, making the page 
table larger than the recommended minimum size reduces the frequency of such collisions, 
by making the primary PTEGs more sparsely populated, and further reducing the need to 
use the secondary PTEGs. 


Table 6-18 shows some example sizes for total main memory. The recommended minimum 
page table size for these example memory sizes are then outlined, along with their 
corresponding HTABORG and HTABMASK settings. Note that systems with less than 
eight Mbytes of main memory may be designed with the MPC601, but the minimum 
amount of memory that can be used for the page tables is 64 Kbytes. 


Table 6-19. Recommended Page Table Sizes (Minimum) 


aA Settings for 
Recommended Minimum Recommended Minimum 


Number of 
Memory for Mapped Number of 
Page Tables Pages PTEGs HTABORG HTABMASK 
(PTEs) 


eines [vores @ | 2% |e | ene | woot ont | 
tines 8) [sa vapes [2% |e [ vwmcow [oom 
ss pas | anones @) | _2® | 2 | veo onn_| oon 
ene [anes @%) [2° |__| wntoone | oom i 


As an example, if the physical memory size is 229 bytes (512 Mbyte), then there are 229- 
212 (4 Kbyte page size) = 217 (128 Kbyte) total page frames. If this number of page frames 
is divided by 2, the resultant minimum recommended page table size is 216 PTEGs, or 222 
bytes (4 Mbytes) of memory for the page tables. 














Total Main Memory 



















6.9.1.3 Hashing Functions 

The processor uses two different hashing functions, a primary and a secondary, in the 
creation of the physical addresses used in a page table search operation. These hashing 
functions efficiently distribute the PTEs within the page table, in that there are two possible 
PTEGs where a given PTE can reside. Additionally, there are eight possible PTE locations 
within a PTEG where a given PTE can reside. If a PTE is not found using the primary 
hashing function, the secondary hashing function is performed, and the secondary PTEG is 
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searched. Note that these two functions must also be used by the operating system to 
appropriately set up the page tables in memory. 


The use of the two hashing functions provides a high probability that a required PTE is 
resident in the page tables, without requiring the definition of all possible PTEs in main 
memory. However, if a PTE is not found in the secondary PTEG, then a page fault occurs 
and an exception is taken. Thus, the required PTE can then be placed into either the primary 
or secondary PTEG by the system software, and on the next UTLB miss to this page, the 
PTE will be found. 


The address of a page table is derived from the HTABORG field of the SDR1 register, and 
the output of the corresponding hashing function (primary hashing function for primary 
PTEG and secondary hashing function for a secondary PTEG). The value in HTABMASK 
determines how many of the higher-order hash value bits are masked and how many are 
used in the generation of the physical address of the page table. 


Figure 6-18 depicts the hashing functions used by the MPC6()1. The inputs to the primary 
hashing function are the lower-order 19 bits of the VSID field of the selected segment 
register (bits 5—23 of the 52-bit virtual address), and the page index field of the logical 
address (bits 24—39 of the virtual address) concatenated with three zero higher-order bits. 
The XOR of these two values generates the output of the primary hashing function (hash 
value 1). 
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Primary Hash: 


5 23 
Low-Order 19 Bits of VSID (From Segment Register) 
XOR 
24 39 





Page Index (From Logical Address) 


Output of Hashing Function 1 Hash Value 1 


0 8 9 18 
a) ee 


Secondary Hash: 
0 18 


Hash Value 1 


One's Complement Function 


Output of Hashing Function 2 


Figure 6-18. Hashing Functions 





Hash Value 2 


When the secondary hashing function is required, the output of the primary hashing 
function is complemented with one’s complement arithmetic, to provide hash value 2. 


6.9.1.4 Page Table Addresses 


Figure 6-19 illustrates the generation of the addresses used for accessing the hashed page 
tables defined for the MPC601. As stated earlier, the operating system must synthesize the 
table search algorithm for setting up the tables. 


As shown in Figure 6-19, two of the elements that define the 52-bit virtual address (the 
segment register VSID field and the page index field of the logical address) are used as 
inputs into a hashing function. Depending on whether the primary or secondary PTEG is to 
be accessed, the processor uses either the primary or secondary hashing function. 
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~«<q————._ Virtua! Page Number (VPN) ———————_> 
45 23 24 29 30 39 


52-Bit Virtual Address Virtual Segment !D API Byte Offset 
eh (eh 


Page Index (16-bit) 


(16-bit) 


oO 










SDR1 






XXXX XX. 2-2 ee 
(16-bit) 


Mask 9 bits 10 bits 


AND 


PAGE TABLE 
PTE7 


15 16 25 26 


PTEG Select, Select 


32-bit Physical Address 





PPN Byte Offset 
(20-bit) (12-bit) 


Figure 6-19. Generation of Addresses for Page Tables 
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The base address of the page table is defined by the higher order bits of SDR1I[HTABORG]. 
Bits 7-15 of the PTEG address are derived from the masking of the higher-order bits of the 
hash value (as defined by SDR1[HTABMASK]) concatenated with (implemented as an OR 
function) the remaining bits of SDRI[HTABORG]. Bits 16-25 of the PTEG address are the 
10 lower order bits of the hash value, and bits 26-31 of the PTEG address are zero. In the 
process of searching for a PTE, the processor first checks PTEO (at the PTEG base address). 


6.9.1.5 Page Table Structure 


In the process of searching for a PTE, the processor interprets the values read from memory 
as described in Section 6.8.3.2, “Page Table Entry (PTE) Format.” The VSID and the 
abbreviated page index (API) fields of the 52-bit virtual address of the access are compared 
to those same fields of the PTEs in memory. In addition, the valid (V) bit and the hashing 
function (H) bit are also checked. For a hit to occur, the V bit of the PTE in memory must 
be set. If the fields match and the entry is valid, the PTE is considered a hit if the H bit is 
set as follows: 


¢ If this is the primary PTEG, H=0 
¢ If this is the secondary PTEG, H=1 


The physical address of the PTEs to be checked is derived as shown in Figure 6-19, and is 
the address of a group of eight PTEs (a PTEG). During a table search operation, the 
processor first compares the PTEO location defined by the primary hashing function. If the 
VSID and API fields do not match (or if V or H are not set appropriately), the processor 
increments the lower order address bits by eight bytes and checks the PTE] location and so 
on, until all eight PTEs in the PTEG have been checked. 


If no match is found, the secondary hashing function is performed, and the secondary 
PTEG address is derived. The eight PTEs within the secondary PTEG are then similarly 
checked. If the required PTE is not found in any of the 16 possible locations (the eight PTEs 
within the primary PTEG and the eight PTEs within the secondary PTEG), then a page fault 
occurs and an exception is taken. Thus, if a valid PTE is located in the page tables, the page 
is considered resident; if no matching (and valid) PTE is found for an access, the page is 
interpreted as non-resident (page fault) and the operating system must load the PTE (and 
possibly the page) into main memory. 


Note that for performance reasons, PTEs should be allocated by the operating system first 
beginning with the PTEO locations within the primary PTEG, then PTE1, and so on. If more 
than eight PTEs are required within the address space that defines a PTEG address, the 
secondary PTEG can be used. Nonetheless, it may be desirable to place the PTEs that will 
require most frequent access at the beginning of a PTEG and reserve the PTEs in the 
secondary PTEG for the least frequently accessed PTEs. 


6.9.1.5.1 Page Table Structure Example 


Figure 6-20 shows the structure of an example page table. The base address of this page 
table is defined by bits 0-13 in SDR1[HTABORG]; note that bits 14 and 15 of HTABORG 
must be zero because the lower order two bits of HTABMASK are ones. The addresses for 
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individual PTEGs within this page table are then defined by bits 14~25 as an offset from 
bits 0-13 of this base address. Thus the size of the page table is defined as 4096 PTEGs. 


HTABORG HTABMASK 
15 23 31 


Ce TS ee aed reg ee 


Example: 
Given: SDRi |1010 0110 0000 0000 0000 0000 0000 0011 


| Base Address | 


Page Table 











$A600 0000 PTEGO 


PTEGaddr1 






PTEGaddr2 


PTEG4095 


0 14 25 31 
[> — ot 
PTEGaddr1 = 1010 0110 0000 O00aa_ = aaaa aaaa aa00 0000 


0 14 25 31 


re | 
PTEGaddr2 = 1010 0110 0000 O0bb_ bbbb bbbb bbO0O 0000 
Figure 6-20. Example Page Table Structure 


Two example PTEG addresses are shown in the figure as PTEGaddr1 and PTEGaddr72. Bits 
14-25 of each PTEG address in this example page table are derived from the output of the 
hashing function (bits 26-31 are zero to start with PTEO of the PTEG). In this example, the 
‘b’ bits in PTEGaddr2 are the one’s complement of the ‘a’ bits in PTEGaddr1. If bits 14— 
25 of PTEGaddr1 were derived by using the primary hashing function, then PTEGaddr2 
corresponds to the secondary PTEG. 
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Note, however, that the ‘b’ bits in PTEGaddr2 can also be derived from a combination of 
logical address bits, segment register bits, and the primary hashing function. In this case, 
then PTEGaddr1 corresponds to the secondary PTEG. Thus, while a PTEG may be 
considered a primary PTEG for some logical addresses (and segment register bits), it may 
also correspond to the secondary PTEG for a different logical address (and segment register 
value). 


It is the value of the H bit in each of the individual PTEs that identifies a particular PTE as 
either primary or secondary (there may be PTEs that correspond to a primary PTEG and 
PTEs that correspond to a secondary PTEG, all within the same physical PTEG address 
space). Thus, only the PTEs that have H=0 are checked for a hit during a primary PTEG 
search. Likewise, only PTEs with H=1 are checked in the case of a secondary PTEG search. 


6.9.1.5.2 PTEG Address Mapping Example 

Figure 6-21 shows an example of a logical address and how its address translation (the 
PTE) maps into the primary PTEG in physical memory. The example illustrates how the 
processor generates PTEG addresses for a table search operation; this is also the algorithm 
that must be used by the operating system in creating the page tables. 


In the example, the value in SDR1 defines a page table at address x‘OF98 0000’ that 
contains 8192 PTEGs. The example logical address selects segment register 0 (SRO) with 
the highest order four bits. The contents of SRO are then used along with bits 4-19 of the 
logical address to create the 52-bit virtual address. 


To generate the address of the primary PTEG, bits 5-23, and bits 24—39 of the virtual 
address are then used as inputs into the primary hashing function (XOR) to generate hash 
value 1. The lower order 13 bits of hash value 1 are then concatenated with the higher order 
13 bits of HTABORG, defining the address of the primary PTEG (x‘OF9F F980’). 
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HTABORG HTABMASK 

15 23 31 

l 1 Ve el 

Given: SDR1 |0000 1111 1001 1000 0000 0000 0000 0111 


Example: 


0 4 19 20 25 31 
LA=x'00FF AO1B’: 0000, 0000 1111 1111 1010, 0000 0001 1011 


Segment Register Select 
xc A 7 0 1 C’ 





Byte Offset 


SRO | 0010 0000 1100 1010 0111 0000 0001 1100 


8 31 
Virtual Address: VSID Page Index 





1100 1010 0111 0000 0001 1100 ooo00 1111 1111 1010 0000 0001 1011 


Primary Hash: 








Hash Value 1 
10-bits 


Primary PTEG Address: 


HTABORG 13 25 = Start at PTEO 


0000 1111 1001 qty 1111. 1001 1000 0000 
x' 0 F 9 F F 9 8 0’ 


Figure 6-21. Example Primary PTEG Address Generation 


Figure 6-22 shows the generation of the secondary PTEG address for this example. If the 
secondary PTEG is required, the secondary hash function is performed and the lower order 
13 bits of hash value 2 are then concatenated with the higher order 13 bits of HTABORG, 
defining the address of the secondary PTEG (x’0F98 0640’). 


As described in Figure 6-19, the 10 lower-order bits of the page index field are always used 
in the generation of a PTEG address (through the hashing function). This is why only the 
abbreviated page index (API) is defined for a PTE (the entire page index field does not need 
to be checked). For a given logical address, the lower order 10 bits of the page index (at 
least) contribute to the PTEG address (both primary and secondary) where the 
corresponding PTE may reside in memory. Therefore, if the higher order 6 bits (the API 
field) of the page index match with the API field of a PTE within the specified PTEG, the 
PTE mapping is guaranteed to be the unique PTE required. 
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Primary PTEG Address: 
HTABORG 16 25 Start at PTEO 
——_——_——— or 


0000 «61111 1001 N11 1111. 1001 1000 0000 
x 0 F 9 F F 9 8 0' 












Hash Value 2 101 100 OO = 1111 0001 1001 


10-bits 


Secondary PTEG Address: 
HTABORG 


0000 1111 1001 0000 0110 0100 0000 
x’ 0 F 9 8 0 6 4 0’ 


x'0F98 0000' PTEGO 


1) First compare 8 PTEs 


at x'OF9F F980’ xorgsoe4o' PTE} =| | | | | |pre7pteaes 


2) Then, compare 8 PTEs 
at x'0F98 0640’, 


ere) corr Faeo real [|| Frprecetes 


PTEG8191 





Figure 6-22. Example Secondary PTEG Address Generation 


Note that a given PTEG address does not map back to a unique logical address. Not only 
can a given PTEG be considered both a primary and a secondary PTEG (as described in 
Section 6.9.1.5.1, “Page Table Structure Example”), but in this example, bits 24—26 of the 
page index field of the virtual address are not used to generate the PTEG address. 
Therefore, any of the eight combinations of these bits will map to the same primary PTEG 
address. (However, these bits are part of the API and are therefore compared for each PTE 
within the PTEG to determine if there is a hit.) Furthermore, a logical address can select a 
different segment register with a different value such that the output of the primary (or 
secondary) hashing function happens to equal the hash values shown in the example. Thus 
these logical addresses would also map to the same PTEG addresses shown. 


6.9.2 Page Table Search Operation 


An outline of the table search process performed by the MPC601 in the search of a PTE is 
as follows: 


1. The 32-bit physical address of the primary PTEG is generated as described in 
Section 6.9.1.4, “Page Table Addresses”. 


2. The first PTE (PTEO) in the primary PTEG is read from memory. 
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The PTE in the selected PTEG is tested for a match with the virtual page number 
(VPN) of the access, The VPN is the VSID concatenated with the page index fields 
of the virtual address. For a match to occur, the following must be true: 

— PTE[H] =0 

— PTE[V] = 1 

— PTE[VSID] = VA[0-23} 

— PTE[API]} = VA[24-29] 

If a match is not found, step 3 is repeated for each of the other seven PTEs in the 


primary PTEG. If a match is found, go to step 8. If a match is not found within the 
8 PTEs of the primary PTEG, the address of the secondary PTEG is generated. 


. The first PTE (PTEO) in the secondary PTEG is read from memory. 
. The PTE in the selected secondary PTEG is tested for a match with the virtual page 


number (VPN) of the access. For a match to occur, the following must be true: 
— PTE[H]=1 

— PTE[V]=1 

— PTE[VSID] = VA[0-23] 

— PTE[API] = VA[24-29] 

If a match is not found, step 6 is repeated for each of the other seven PTEs in the 
secondary PTEG. 


If a match is found, the PTE is written into the UTLB and the R bit is updated in the 
PTE in memory (if necessary). If there is no memory protection violation, the C bit 
is also updated in memory and the table search is complete. 


If a match is not found within the 8 PTEs of the secondary PTEG, the search fails, 
and a page fault exception condition occurs (either an instruction access exception 
or a data access exception). 


Reads from memory for table search operations are performed as global (but not exclusive), 
cacheable operations, and are loaded into the on-chip cache of the MPC601. 


Figure 6-23 and Figure 6-24 provide detailed flow diagrams of the table search operations 
performed by the MPC601. Figure 6-23 shows the case of a dcbz instruction that is 
executed with W=1 or I =1, and that the R bit is updated in memory (if required) before the 
alignment exception occurs. The R bit is also updated (if required) in the case of a memory 
protection violation except for the case of a debt or a debtst instruction. If either of these 
instructions is executed and a protection violation occurs, the translation is simply aborted, 
the R bit is not set in memory and the instruction execution becomes a no-op (not shown in 
the figure), 
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Figure 6-23. Primary Table Search Flow 
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Secondary Table Search 


Generate PA using Secondary Hash Function 
PA < Base PA of PTEG 
(PA26—PA31=0) 





Fetch PTE from PTEG 






PA — PA+8 
(Fetch next PTE in PTEG) 






Single-beat read from PA 
Fetch PTE (64 bits) 


> 
> 


PTE [VSID, API, H, Vj= 
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Search Hit 


(See Figure 6-23) 





otherwise 
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Set SDISR[1]=1 


Data Access 
Exception 


Figure 6-24. Secondary Table Search Flow 











Set SRAI[1]=1 


Instruction Access 
Exception 


6.9.3 Page Table Updates 


This section describes the requirements on the software when updating page tables in 
memory via some pseudo-code examples. In a multiprocessor system the rules described 
in this section must be followed so that all processors operate with a consistent set of page 
tables. Even in a single processor system, certain rules must be followed, regarding 
reference and change bit updates, because software changes must be synchronized with 
automatic updates made by the hardware. Updates to the tables include the following 
operations: 


¢ Adding a PTE 
¢ Modifying a PTE, including modifying the R and C bits of a PTE 
¢ Deleting a PTE 


PTEs must be ‘locked’ on multiprocessor systems. Access to PTEs must be appropriately 
synchronized by software locking of (i.e., guaranteeing exclusive access to) PTEs or 
PTEGs if more than one processor can modify the table at that time. In the examples below, 
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“lockQ” and “unlock()” refer to software locks that must be performed to provide exclusive 
status for the PTE being updated. See Appendix G, “Synchronization Programming 
Examples,” for more information about the use of the lwarx and stwex. instructions to 
perform software interlocks. 


On single processor systems, PTEs need not be locked. To adapt the examples given below 
for the single processor case, simply delete the “lock()” and “unlockQ” lines from the 
examples. The sync instructions shown are required even for single processor systems. 


The UTLB (and ITLB) are non-coherent caches of the page tables. UTLB entries must be 
flushed explicitly with the TLB invalidate entry instruction (tlbie) whenever the 
corresponding PTE is modified. In a multiprocessor system, the tlbie instruction must be 
controlled by software locking, so that the tlbie is issued on only one processor at a time. 
The sync instruction causes the processor to wait until the TLB invalidate operation in 
progress by this processor is complete. 


The PowerPC architecture defines the tlbsync instruction (an illegal instruction in the 
MPC6()1) that ensures that TLB invalidate operations executed by this processor have 
caused all appropriate actions in other processors on the system bus. In a system that 
contains both MPC6()1 processors and other PowerPC processors, the tlbsync functionality 
must be emulated for the MPC601 in order to ensure proper synchronization with the other 
PowerPC processors. 


Any processor, including the processor modifying the page table, may access the page table 
at any time in an attempt to reload a UTLB entry. An inconsistent page table entry must 
never accidentally become visible; thus there must be synchronization between 
modifications to the valid bit and any other modifications. This requires as many as two 
sync operations for each PTE update. 


The MPC6()1 writes reference and change bits with unsynchronized, atomic byte store 
operations. Note that the V, R, and C bits each resides in a distinct byte of a PTE. Therefore, 
extreme Care must be taken to ensure that no store operation inadvertently overwrites one 
of these bytes. 


6.9.3.1 Adding a Page Table Entry 


Adding a page table entry requires only a lock on the PTE in a multiprocessor system. The 
bytes in the PTE are then written, except for the valid bit. A syne instruction then ensures 
that the updates have been made to memory, and lastly, the valid bit is set. 


lock(PTE) 

PTE[VSID,H,API] — new values 
PTE[PPN,R,C,WIM,PP] <— new values 
sync 

PTE[V] <1 

unlock(PTE) 
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6.9.3.2 Modifying a Page Table Entry 


This section describes several scenarios for modifying a PTE. 


6.9.3.2.1 General Case 

In the general case, a currently-valid PTE must be changed. To do this, the PTE must be 
locked, marked invalid, flushed from the TLB, updated, marked valid again, and unlocked. 
The sync instruction must be used at appropriate times to wait for modifications to 
complete. | 


Note that the tIbsync and the sync instruction that follow are only required if compatibility 
is must be maintained with other PowerPC processors that implement the tlbsyne 
instruction. The tlbsync instruction is not implemented in the MPC601 but can be emulated 
in the illegal instruction exception handler. 


lock(PTE) 

PTE[V]<— 0 

sync 

tibie(PTE) 

sync 

tibsync 

sync 

PTE[VSID,H,API] — new values 
PTE[PPN,R,C,WIM,PP] <— new values 
sync 

PTE[V] < 1 

unlock(PTE) 


6.9.3.2.2 Clearing the Reference (R) Bit 


When the PTE is modified only to clear the R bit to 0, a much simpler algorithm suffices 
because the R bit need not be maintained exactly. 


lock(PTE) 

oldR — PTE[R] 

PTE[R] — 0 

if oldR = 1, then tlbie(PTE) 
unlock(PTE) 


Since only the R and C bits are modified by the processor, and since they reside in different 
bytes, the R bit can be cleared by reading the current contents of the byte in the PTE 
containing R (bits 16-23 of the second word), ANDing the value with x‘FE’, and storing 
the byte back into the PTE. © 


6.9.3.2.3 Modifying the Virtual Address 
If the virtual address is being changed to a different address within the same hash class 
(primary or secondary), the following flow suffices: 
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lock(PTE) 

val — PTE[VSID,API,H, V] 
val — new VSID 
PTE[VSID,API,H,V] < val 
sync 

tibie(PTE) 

sync 

tlbsync 

sync 

unlock(PTE) 


In this pseudo-code flow, note that the store into the first word of the PTE is performed 
atomically. Also, the tlbsyne and the sync instruction that follow are only required if 
compatibility is must be maintained with other PowerPC processors that implement the 
tibsync instruction. The tlbsync instruction is not implemented in the MPC601 but can be 
emulated in the illegal instruction exception handler. 


6.9.3.3 Deleting a Page Table Entry 


In this example, the entry is locked, marked invalid, invalidated in the TLBs, and unlocked. 


Again, note that the tlbsyne and the sync instruction that follow are only required if 
compatibility is must be maintained with other PowerPC processors that implement the 
tlbsync instruction. The tlbsync instruction is not implemented in the MPC6()1 but can be 
emulated in the illegal instruction exception handler. 


lock(PTE) 
PTE[V] — 0 
sync 
tibie(PTE) 
sync 
tibsync 
sync 
unlock(PTE) 


6.9.4 Segment Register Updates 


There are certain synchronization requirements for using the move to segment register 
instructions. These are described in Section 2.3.3.1, “Synchronization for Supervisor-Level 
SPRs, and Segment Registers.” 


6.10 I/O Controller Interface Address Translation 


An I/O controller interface segment is a mapping of logical addresses to the I/O controller 
interface bus protocol. I/O controller interface segments are provided for POWER 
compatibility. Applications that require low-latency load/store access to external address 
space should use memory-mapped I/O, rather than the I/O controller interface. 
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A logical address within the I/O controller interface space corresponds to a segment register 
which has T=1. For more details about memory references to I/O controller interface 
segments, refer to Chapter 9, “System Interface Operation.” 


As a subset of I/O controller interface address translation, the MPC6()1 also provides a way 
to force I/O controller interface accesses to be made to memory. This memory-forced I/O 
controller interface capability allows a 256-Mbyte segment of memory to be mapped with 
only one segment register and no page translation overhead. Note that this functionality 
may not be provided in other PowerPC processors. 


6.10.1 Segment Register Format for I/O Controller Interface 
Figure 6-25 shows the register format for the segment registers when the T bit is set. 


BUID Controller Specific 


0123 1112 31 


Figure 6-25. Segment Register Format for I/O Controller Interface 
Table 6-20 shows the bit definitions for the segment registers when the T bit is set. 


Table 6-20. Segment Register Bit Definitions for I/O Controller Interface 


[ee | te [in 
ofr frarsacs soma | 
fis [Sirs menos _| 
[2 [seer rsermenarioy 
fai favo fawn 


12-31 Controller | Device-dependent data for I/O 
specific | controller 


6.10.2 I/O Controller Interface Accesses 


When the address translation process determines that the segment has T=1, I/O controller 
interface address translation is selected and any match due to block address translation (see 
Section 6.7, “Block Address Translation’) is ignored. Additionally, no reference is made 
to the page tables. The following data is sent to the memory controller in the protocol (two 
packets consisting of address-only cycles) described in Section 9.6, “I/O Controller 
Interface Operation”: 


* Packet0 
— One of the Kx bits (Ks or Ku) is selected to be the key as follows: | 
~ For supervisor accesses (MSR[PR]=0), the Ks bit is used and Ku is ignored 
— For user accesses (MSR[PR]=1), the Ku bit is used and Ks is ignored 
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— The contents of bits 3-31 of the segment register, which is the BUID field 
concatenated with the “controller-specific” field. 


¢ Packetl—SR[28—31] concatenated with the 28 lower-order bits of the logical 
address, LA4—LA31. 


The WIM bits for I/O controller interface accesses are forced to b’0)10’. Some instructions 
cause multiple address/data transactions to occur on the bus. The address for each 
transaction is handled individually with respect to the MMU. 


6.10.3 1/O Controller Interface Segment Protection 


Page-level protection as described in Section 6.8.5, “Page Memory Protection,” is not 
provided by the MPC601 for I/O controller interface segments. The appropriate key bit (Ks 
or Ku) from the segment register is sent to the memory controller, and the memory 
controller implements any protection required. Frequently, no such mechanism is provided; 
the fact that a I/O controller interface segment is mapped into the address space of a process 
may be regarded as sufficient authority to access the segment. 


6.10.4 Memory-Forced I/O Controller Interface Accesses 


The MPC601 performs memory-forced I/O controller interface accesses when the T bit in 
the selected segment register is set and the BUID field in the segment register is x’07F’. In 
this case, the processor bypasses all protection mechanisms and generates a memory access 
with the physical address specified by the lowest-order four bits in the segment register 
(SR[28-31]) concatenated with LA4—LA31. In this case, the processor assumes the WIM 
bits to be ‘011’, denoting the access as cache-inhibited and global. 


6.10.5 Instructions Not Supported in I/O Controller Interface 
Segments 


The following instructions are not supported when issued with a logical address that selects 
a segment register that has T=1: 


e lwarx 
* stwex. 
* Iscbx 


If one of the above instructions is executed with a logical address corresponding to a 
segment with T=1, a data access exception occurs and DSISR[5] is set. 


The following instructions are not supported at all and cause boundedly undefined results 
when issued with a logical address that selects a segment register that has T=1(or when 
MSR[DT]=0): 


* eciwx 
* ecowx 
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6.10.6 Instructions with No Effect in I/O Controller Interface 
Segments 

The following instructions are executed as no-ops when issued with a logical address that 
selects a segment where T=1: 

°¢ dcbt 

¢ dcbtst 

¢ dcbf 

¢ debi 

° dcbst 

° dcbz 


6.10.7 I/O Controller Interface Summary Flow 


Figure 6-26 shows the flow used by the MMU when I/O controller interface address 
translation is selected. This figure expands the I/O Controller Interface Translation stub 
found in Figure 6-4 for both instruction and data accesses. 
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Figure 6-26. !/O Controller Interface Translation Flow 
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Chapter 7 
Instruction Timing 


This chapter describes instruction prefetch and execution through all of the execution units 
of the MPC601 processor. It also provides examples of instruction sequences showing 
concurrent execution and various register dependencies to illustrate timing interactions. 
Bus signals described in this chapter are only accurate to within half-clock cycle 
increments. Refer to Chapter9, “System Interface Operation,” for more specific 
information regarding bus operation timing. Instruction mnemonics used in this chapter can 
be identified by referring to Chapter 10, “Instruction Set.” 


7.1 Instruction Timing Overview 


The MPC601 processor has been designed to minimize average instruction execution 
latency. Latency is defined as the number of clock cycles necessary to execute an 
instruction and make ready the results of that execution for a subsequent instruction. For 
the majority of instructions in the MPC601, this can be simplified to include only the 
execute phase for a particular instruction. However, data access instructions require 
additional clock cycles between the execute phase and the writeback phase due to memory 
latencies. 


In accordance with this definition, logical, bit-field, and most integer instructions have a 


latency of one clock cycle (for example, results for these instructions are ready for use on .- 


the next clock cycle after issue). Other instructions, such as the integer multiply, require 
more than one clock cycle to complete execution. 


Effective throughput of more than one instruction per clock cycle can be realized by the 
many performance features in the MPC6()1 including pipelining, superscalar instruction 
issue, branch acceleration, and multiple execution units that operate independently and in 
parallel. 


Many of the execution units on the MPC601 are said to be pipelined. This implies that the 
particular execution unit is broken into stages. Each stage performs a specitic step, which 
contributes to the overall execution of an instruction. The pipelined design is analogous to 
an assembly line where workers perform a specific task and pass the partially complete 
product to the next worker. 
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When an instruction is issued to a pipelined execution unit, the first stage in the pipeline 
begins its designated work on that instruction. As an instruction is passed from one stage in 
the pipeline to the next, evacuated stages may accept new instructions. This design allows 
a single execution unit to be working on several different instructions simultaneously. Once 
the pipeline has been filled with instructions, the execution unit completes a multi-cycle 
instruction every clock. 


Figure 7-1 shows a graphical representation of a generic pipelined execution unit. 
CLOCK 0 (STAGE 1) A | | (STAGE 2) (STAGE 3) 


CLOCK 1 (STAGE 1) B] | (STAGE 2) A | | (STAGE 3) 
CLOCK2 | (STAGE 1) C||(STAGE2) B| | (STAGE3) A 


CLOCK 3 (STAGE 2) A| | (STAGE 3) B 
Figure 7-1. Pipelined Execution Unit 


If the number of stages in each pipeline is equal to the total latency in clock cycles of its 
respective execution unit, the processor can continuously issue instructions to the same 
execution unit without stalling. Thus, when enough instructions have been issued to an 
execution unit to fill its pipeline, the first instruction will have completed execution and 
exited the pipeline, allowing subsequent instructions to be issued into the tail of the pipeline 
without interruption. 


The MPC601 is capable of retiring three instructions on every clock cycle. In general, 
instruction processing is accomplished in four stages—the prefetch stage, the decode stage, 
the execute stage, and the writeback stage. The instruction prefetch stage includes the clock 
cycles necessary to request instructions from the on-chip cache as well as the time it takes 
the on-chip cache to respond to that request. The decode stage consists of the time it takes 
to fully decode the instruction. Each of the three execution units on the MPC601 implement 
this general pipeline model slightly differently. These details are explained in the following 
paragraphs. 


In the writeback stage, results are returned to the register file. This stage generally does not 


contribute to the overall execution time. Instructions are prefetched and executed 


concurrently with the execution and writeback of previous instructions producing an 
overlap period between instructions. 


7.2 Timing Considerations of the MPC601 


A superscalar machine is one that can issue multiple instructions concurrently from a 
conventional linear instruction stream. The MPC601 is a true superscalar implementation 
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of the PowerPC architecture since three instructions can be issued to multiple execution 
units during each clock cycle. Although a superscalar implementation complicates 
instruction timing, these complications are largely transparent to the software. The 
MPC6()1 provides the logical functionality of issuing only a single instruction at a time, 
while providing the increased performance of issuing multiple instructions at a time. 


The execution unit pipelines are hardware interlocked, therefore, data dependencies 
automatically stall instruction issue without software assistance. This hardware 
interlocking mechanism eliminates the need to schedule wasteful no-op instructions. 


When an instruction is issued, the register file places the appropriate source data on the 
appropriate source bus. The corresponding execution unit then reads the data from the bus. 
The register files and source buses have sufficient bandwidth to sustain the peak issue rate 
of three instructions per clock. 


The MPC601 contains the following execution units that operate independently and in 
parallel: 


¢ Branch processing unit (BPU) 
¢ 32-bit integer unit (IU) 
¢ 64-bit floating-point unit (FPU) 


When the IU finishes executing an instruction, it places the resulting data, if any, onto one 
of the writeback buses. The results are then stored into the correct general-purpose register. 
If a subsequent instruction is waiting for this data, it is forwarded past the register file, 
directly into the appropriate execution unit for the immediate execution of the waiting 
instruction. This allows a data-dependent instruction to be decoded without waiting for the 
data to be written into the register file and then read back out again. This feature, known as 
feed forwarding, significantly shortens the time the machine may stall on data 
dependencies. 


When the FPU finishes executing an instruction, it places the resulting data, if any, onto one 
of the writeback buses. The results are then stored into the correct floating-point register. If 
a subsequent instruction is waiting for this data, that instruction must wait for the data to be 
written into the floating-point register file. On the next clock cycle, the following 
instruction may begin decode. In other words, the floating-point execution unit is not 
equipped with a feed-forwarding mechanism. The exception to this point is when a floating- 
point instruction is waiting for data from a floating-point load operation. In this case, the 
floating-point operation may begin decode during the same clock cycle as the floating-point 
register file is being updated. 


When the BPU finishes executing an instruction, it places the resulting data, if any, onto 
one of the writeback buses. The results are then stored into the correct special-purpose 
register. If a subsequent instruction is waiting for this data, it is forwarded past the register 
file, directly into the appropriate execution unit for the immediate execution of the waiting 
instruction. This allows a data-dependent instruction to be decoded without waiting for the 
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data to be written into the register file and then read back out again. The feed forwarding 
feature significantly shortens the time the machine may stall on data dependencies. 


7.2.1 Instruction Queue (IQ) 


The instruction queue (IQ) contains instructions prefetched from the current instruction 
stream. Instructions enter the IQ and are issued to the various execution units from the IQ. 
The IQ is an eight-entry queue, which is the backbone of the master pipeline for the 
microprocessor. The MPC601 tries to keep the IQ full at all times. As previously 
mentioned, a maximum of three instructions can be dispatched during a single clock cycle. 
If while topping off the IQ, the request for new instructions misses in the on-chip cache, 
then a memory access will only occur if the IQ is half empty. In other words, if the IQ is 
only trying to fill its top half (only needs one to four instructions), and that instruction 1s 
not found in the on-chip cache, a memory access will not occur. However, if the IQ is trying 
to fill the top five entries (5 instructions are needed), and those instructions are not found 
in the on-chip cache, arbitration for a memory access will begin. The figures in this chapter 
show the changes in the IQ due to the execution of the flow-control instructions; they do 
not show the dynamic state of the IQ or the “topping off” effect. 


Instructions enter the IQ through entry 7 and filter down to entry 0. The prefetch bus 
between the IQ and the on-chip cache is wide enough for eight instructions to be brought 
into the IQ simultaneously; that is, the IQ can go from being completely empty to 
completely full in one clock cycle. Note that although a maximum of eight instructions can 
be brought in from the on-chip cache in a single clock cycle, some restrictions do occur. 
Specifically, only the instructions between the instruction requested and the last word in the 
mod-32 aligned block are fed into the IQ. For example, if the BPU requests a block of 
instructions starting at address x’10’, then instructions contained in the block from x’10’ to 
x’2()’ will be sent to the IQ. 


Each of the execution units pulls instructions out of the IQ from specified entries. For 
example, integer instructions are only dispatched from the IQ through entry 0. In fact, IQ- 
() is also the decode stage for the integer unit. Floating-point instructions may be dispatched 
to the FPU from entries 0-3. The branch processing unit also may pull instructions from 
the IQ from entries ()-3. 


7.2.2 General Instruction Flow 


Instructions are said to “issue” from the IQ to the appropriate execution unit. Although 
there are only three execution units that pull instructions out of the IQ, each execution unit 
may have several paths from which to pull instructions out of the IQ. Figure 7-2 shows how 
instructions can be pulled from the IQ and how those instructions progress through the 
various execution units. Note that Figure 7-2 is not a data flow diagram, rather it simply 
shows the various stages in the processor. 
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Figure 7-2. Instruction Stages 
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Once an instruction has been pulled from the IQ by the branch processing unit, the 
remaining instructions found above the one just pulled by the BPU will be shifted down by 
one element. Once the BPU has pulled an instruction from the IQ, that instruction is placed 
into the decode/execute stage of the BPU. The branch is either executed and resolved, or is 
predicted. Once a branch instruction has been executed, it may need to update a special 
purpose register. In that case, the branch instruction will do its writeback sometime after 
the decode/execute phase. If no writeback is needed, the branch instruction is retired. 


An integer instruction cannot issue to the IU until that instruction has filtered to the bottom 
element of the IQ. Once the integer instruction has been properly decoded, then it moves 
into the next phase of the IU pipeline. If the execute phase of the IU is not available, then 
the integer instruction may be issued into a one-entry buffer that lies between IQ-0 (the 
decode stage for the integer unit) and the execute phase of the integer unit. If the execute 
phase of the IU is available, the instruction will fall through the integer unit buffer into the 
execute phase with no delay. It is important to note that if a data dependency is present, then 
the instruction will be held in the execute stage of the YU pipeline. Once the data 
dependency is resolved, the pending instruction will continue execution and will proceed 
thought the IU pipeline. This is important because if an instruction stalls in the IU pipeline, 
it will stall in the execute stage, leaving the decode stage and the buffer open for subsequent 
integer instructions to be issued to. Once execution is complete, the integer instruction is 
moved into the writeback stage where results are written into the register file. 
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Floating-point instructions can issue to the FPU from any one of the bottom four elements 
of the IQ. Once a floating-point instruction has been pulled from the IQ, it moves into the 
floating-point buffer. The floating-point execution unit contains a one-entry buffer that lies 
between the IQ and the floating-point decode stage. All floating-point instructions must 
spend at least one clock cycle in the floating-point buffer. If the decode stage remains 
occupied after an instruction has already spent one clock cycle in the floating-point buffer, 
then the floating-point instruction will remain in the buffer. If both the decode stage and the 
buffer are occupied, then no floating-point instruction may be pulled from the IQ. The one- 
entry buffer in the FPU pipeline helps minimize any penalties when certain instructions do 
not move from one stage in the pipeline to another in a single clock cycle. 


Certain floating-point instructions may spend multiple clock cycles in the decode/execute 
phases of the floating-point unit. When a double-precision multiply instruction is 
encountered, it will spend a minimum of two clock cycles in each phase of the floating- 
point execution unit pipeline—two clock cycles in the decode phase, a minimum of two 
clock cycles in FPU execute 1, and a minimum of two clock cycles in FPU execute 2. When 
one of these instructions enters the floating-point execution unit pipeline, previous floating- 
point pipeline stages (except for the FPU buffer) become unavailable. For example, when 
a double-precision multiply (or single- or double-precision divide) instruction moves from 
the FPU decode stage into the FPU execute stages, no other FPU instruction may enter the 
FPU decode stage until that double-precision instruction has moved out of the FPU execute 
stages into the writeback stage. 


7.2.3 Instruction Prefetch Timing 
The timing of the prefetch mechanism on the MPC601 depends heavily on the state of the 


on-chip cache. There are two factors that determine how quickly the cache responds to the 
BPU’s request for additional instructions: 


¢ Is the cache currently serving a previous request? 
¢ Is the instruction being asked for in the on-chip cache (cache hit) or will a memory 
transaction need to be initiated to bring the data into the cache (cache miss)? 


These issues are discussed further in the following sections. 


7.2.3.1 Cache Arbitration 


When the branch processing unit attempts to prefetch instructions from the on-chip cache, 
the cache may or may not be able to immediately respond to the request. There are four 
scenarios that may be encountered by the BPU when it requests instructions from the on- 
chip cache. 


The first scenario is when the on-chip cache is idle and a request comes in from the BPU 
for additional instructions. In this case, the on-chip cache responds with the requested 
instructions on the next clock cycle. 
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The second scenario occurs if at the time the BPU requests instructions, the on-chip cache 
is busy. A busy state may occur due to accesses in progress to/from memory or when 
snooping cache change activity is in progress. When this case arises, the on-chip cache may 
be inaccessible for one or two clock cycles, depending on the exact state of the memory 
access that is in progress. 


The third scenario occurs if the integer unit has any pending data access operations. In this 
case, priority is always given to the pending IU data accesses. As aresult, the BPU may see 
a delay in the response to its request for instructions. In addition, if one of these pending 
data access operations will cause a cache miss, one of the previously described scenarios 
may also occur. 


Note that if the on-chip cache is servicing a previous access that results in a cache hit, no 
delay is seen by the BPU. 


7.2.3.2 Cache Hit 


Assuming that the branch processing unit gains control of the on-chip cache and the 
instructions it needs are in the on-chip cache (a cache hit has occurred), there will only be 
one clock cycle between the time that the BPU requests the instructions and the time that 
the instructions enter the IQ. As previously stated, any number of instructions between one 
and eight can be simultaneously fetched from the on-chip cache and loaded into the IQ. 


Figure 7-3 shows a brief example of an instruction prefetch that hits in the on-chip cache 
and how that prefetch affects instruction issue. In this example, eight instructions are fed 
into the IQ during clock cycle (). 


During clock cycle 1, instruction (0) is decoded and instruction 1 is fed into the floating-point 
buffer. Notice that the branch instruction (instruction 4) is still not within the bottom four 
elements of the IQ; thus it may not begin its decode/execute phase. 


During clock cycle 2, another integer instruction and floating-point instruction are pulled 
from the IQ. In addition, the branch instruction is now within the bottom four elements of 
the IQ, thus it may be pulled out of the IQ into the branch pipeline. Notice that the branch 
pipeline has a combined decode/execute stage. The BPU is immediately able to determine 
that the branch will indeed change program flow and sends a request to the on-chip cache 
for the new instruction stream. 


During clock cycle 3, the new instructions arrive in the IQ. Note that instructions 5, 6, and 
7 are never decoded and are discarded (because of the taken branch) when the new set of 
instructions is brought into the IQ. 


During clock cycles 4 through 8, the appropriate instructions move through the various 
pipelines toward completion. As the IQ is emptied into the individual execution unit 
pipelines, additional instructions will be requested from the on-chip cache. 
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Figure 7-3. Instruction Timing—Cache Hit 


7.2.3.3 Cache Miss 


Assuming that the BPU gains control of the on-chip cache and the instructions that it needs 
are not in the on-chip cache (a cache miss has occurred), there will only be seven clock 
cycles between the time that the BPU requests the instructions and the time that the 
instructions are available for decode. These seven clock cycles do not take into account any 
wait states that may be present in the memory system. 





Figure 7-4 shows a brief example of an instruction prefetch that misses in the on-chip cache 
and how that prefetch affects the instruction issue. In this example, eight instructions are 
fed into the IQ during clock cycle 0). 


During clock cycle 1, instruction () is decoded and instruction 1 is fed into the floating-point 
buffer. Notice that the branch instruction (instruction 4) is still not within the bottom four 
elements of the IQ, thus it may not begin its decode/execute phase. 
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Figure 7-4. instruction Timing—Cache Miss 


During clock cycle 2, another integer instruction and floating-point instruction are pulled 
from the IQ. In addition, the branch instruction is now within the bottom four elements of 
the IQ, thus may be pulled out of the [Q into the branch pipeline. Notice that the branch 
pipeline has a combined decode/execute stage. The BPU is immediately able to determine 
that the branch will indeed change program flow and sends a request to the on-chip cache 
for the new instruction stream. 


During clock cycle 3, the on-chip cache misses the access and determines that a memory 
access will have to occur. During clock cycle 4 the address of the block of instructions is 
applied to the systcm bus. During clock cycle 5 two instructions (64 bits) are returned from 
memory. The instructions are not fed directly into the on-chip cache as they are received 
from the memory system, but they are buffered into groups of 128 bits. 


During clock cycle 7, the first 128 bits of instructions have been received and a request is 
placed to the on-chip cache for access, in order to actually update the cache with the new 
instructions. Also during clock cycle 7, the third pair of instructions is being received from 
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memory. During clock cycle 8, the request for access to the on-chip cache is acknowledged 
and the first four instructions are fed into the on-chip cache and into the IQ, as required. 
Also during clock cycle 8, the fourth pair of instructions is received from memory. 


During clock cycle 9, another request for access is made to the on-chip cache to update the 
cache with the second four instructions received from memory. Also during clock cycle 9, 
instructions 8 and 9 are pulled from the IQ into the appropriate execution units. 


During clock cycle 10, the request for access to the on-chip cache is acknowledged and the 
second four instructions are fed into the on-chip cache and into the IQ, as required. Also 
during clock cycle 10, two more instructions are pulled from the IQ. 


During clock cycle 11, instructions 12-15 are fed into the IQ. During the following clock 
cycles, these instructions move through the appropriate pipelines toward completion. 


7.2.4 Instruction Decode Timing 


Most instructions can be decoded in one clock cycle on the MPC6()1. In addition, recall that 
the BPU may decode and execute all instructions in a single clock cycle. Although an 
instruction may be decoded in one clock cycle, other factors may keep the instruction from 
moving to the next stage in the pipeline. Those factors include dependencies on source 
operands, dependencies on registers being available to act as the instruction’s destination, 
and the data type of the operands. 


If some dependency exists that may preclude an instruction from beginning execution, that 
instruction may stall in a different stage of its pipeline depending on the type of instruction. 
For example, if it is a floating-point instruction, the instruction is held in the decode stage 
of the floating-point pipeline. If the data that the floating-point operation depends upon is 
returned via a cache access, the decode may begin during the same clock cycle that the 
floating-point register file is being updated. However, if the data that the floating-point 
operation depends upon is returned via the result of a previous floating-point operation, 
then the decode will begin the clock cycle after the floating-point register file is updated. 


If the instruction that has a data dependency is an integer instruction, the instruction is fully 
decoded and may be moved into the execute stage of the integer pipeline where it will wait 
for the source data to become available. The integer instruction will begin execution during 
the same clock cycle as the update to the general-purpose register file. 


If a flow-control operation has a data dependency on the condition register, the instruction 
will be predicted during the decode/execute phase in the BPU. 


If the instruction is a floating-point multiply operation with double-precision operands, 
then that instruction will spend a minimum of two clock cycles in the decode stage of the 
floating-point pipeline. 
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7.2.4.1 Source Register Considerations 


If an instruction attempts to use a source operand that is still being computed by a previous 
instruction, a data dependency exists. When a data dependency exists, instruction 
acceptance into that execution unit is halted until all of the necessary source data is 
available. The MPC60)1 uses a hardware mechanism to keep track of which registers are 
available for use. 


Data access instructions follow a unique set of rules for out-of-order issue and completion. 
Only one memory access instruction can be issued per clock cycle; however, store 
instructions may be issued before the data being stored is available. This allows continued 
issuance and execution of other instructions in parallel with the computation of the source 
data for the store operation. For example, assume a floating-point divide that will update 
register 5 begins execution during clock cycle 0. The results for that divide operation (fr5) 
will not be available for other instructions to use for a number of clock cycles. Now assume 
that a store of fr5 to address 0x’1000’ immediately follows the divide operation in the 
instruction stream. Rather than stalling instruction processing while the store operation 
waits for fr5 to become available, the address of the store is calculated and the store is 
moved into a write buffer. By removing the waiting store from the integer execution 
pipeline, the IU can process additional instructions while the store waits in the write buffer 
for fr5 to become available. 


Note that address operands for store operations are vulnerable to source register checks. For 
example, assume a load that will update register 5 begins execution during clock cycle 0. 
The results for that load operation (15) will not be available for other instructions to use for 
a number of clock cycles. Now assume that a store of r7 to the address contained in r5 
immediately follows the load operation in the instruction stream. Rather than issuing the 
store into a write buffer, the store operation will stall in the [U’s execute stage. In other 
words, for a store to be moved into the write buffer, the address calculation must be 
complete. The address calculation is not able to complete unless all address source 
operands are available. 


Additionally, load instructions may bypass store instructions that are pending as long as the 
address being accessed by the load instruction does not match that being accessed by any 
pending store instruction. 


7.2.4.2 Destination Register Considerations 

The following paragraphs describe how the MPC601 prevents destination registers from 
being overwritten by out-of-sequence instructions and how instructions are prioritized for 
writing back to the register files. 


In a machine that allows instructions to issue, execute, and complete out of order, there is 
the potential for an instruction's result to be overwritten by an instruction that issued later, 
but appears earlier in the instruction stream. Consider the following example: given two 
instructions, instruction_a and instruction_b, where instruction_a occurs before 
instruction_b in the instruction stream. Now, assume that instruction_a is decoded and 
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begins execution during clock cycle 0. Instruction_a completes execution and is ready to 
update general purpose register 30 on clock cycle 6. Assume that instruction_b is decoded 
and begins execution during clock cycle 2. Instruction_b completes execution in a single 
clock cycle and is ready to update general purpose register 30 during clock cycle 3. In this 
case, general purpose register 30 is updated incorrectly. 


To preclude this possibility, a register interlock mechanism is employed that guarantees that 
all source and destination registers are read from and written to in proper order. This 
ensures that updates to any given register are always completed in the order specified by 
the program and thus no data is ever incorrectly overwritten in the register files. If an 
instruction is in execute, it does not move into the writeback stage until its destination 
register is guaranteed not to be the destination register for a previously issued, but 
incomplete, instruction. 


7.2.5 Instruction Execute Timing 


Assuming that an instruction has completed its decode stage, and that the required execute 
stage is available, it should be forwarded into an execute stage. There are additional factors 
that must be considered in calculating when and how an instruction moves from its decode 
stage into its execute stage. First, it is possible that the specified execution unit may not be 
available for any additional instructions. In addition, if an instruction happens to stall in the 
IQ, it is possible that the following instructions may bypass the stalled instruction and begin 
execution. This is known as out-of-order instruction issue. 


Both topics are discussed further in Sections 7.2.5.1, "Execution Unit Considerations,” and 
7.2.5.2, "Out-of-Order Instruction Issue.” 


7.2.5.1 Execution Unit Considerations 

As previously noted, the MPC601 is capable of issuing three instructions per clock cycle. 
One of the hindrances in maintaining this peak is the availability of execution units on each 
clock cycle. 


For an instruction to be issued, the required execution unit must be available, or have an 
available spot in its buffer. The sequencer monitors the availability of all execution units 
and suspends instruction issue if the required execution unit is not available. An execution 
unit may not be available under the following circumstances: 


¢ Anexecution unit may become unavailable for additional instructions if its pipeline 
becomes full. This situation may occur if execution takes more clock cycles than the 
number of pipeline stages in the unit and additional instructions are issued to that 
unit to fill the remaining pipeline stages. 


¢ Execution units can accept only one instruction per clock. 


It is important to note that both the integer unit and the floating-point unit each contain a 
one-entry buffer that help reduce the effects of long-latency instructions. Even if a specific 
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stage of one of these pipelines becomes busy for multiple clock cycles, these buffers may 
continue to accept instructions from the IQ. 


7.2.5.2 Out-of-Order Instruction Issue 

As previously mentioned, integer instructions may only be issued from the IQ through 
element 0). Also mentioned was the fact that floating-point instructions may be issued from 
any of the IQ elements 0-3. Since the different execution units are able to pull instructions 
from the IQ through different elements, it is possible for one execution unit to pull an 
instruction from the IQ and begin execution while a previous instruction remains held in 
the IQ. Figure 7-5 shows an example of out-of-order instruction issue on the MPC6()1. 
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Figure 7-5. Instruction Timing—Out-of-Order Execution 


Cache Access 





On clock 0 in Figure 7-5, eight instructions are received from the on-chip cache. During 
clock cycle 1, instructions 0 and 2 are pulled from the IQ while instruction 1 remains. Note 
that the Load Word and Zero (Iwz) instruction issues to the integer unit while the Floating- 
Point Add (fadd) instruction issues to the floating-point unit; thus these two instructions 
can begin processing during the same clock cycle. 


During clock cycle 2, two more instructions are pulled from the IQ. During the execution 
of instruction 0 on clock 2, a request is sent to the on-chip cache for the required data. 


On clock 3, instructions 1 and 3 have been decoded and are ready to begin execution. 
-Unfortunately, instruction 1 has a data dependency on instruction 0, which has not yet 
completed. For this reason, instruction 1 cannot begin execution on this clock cycle. 
Instruction 1 will wait in the IU until its data dependency is resolved. Notice, however, that 
insiruction 2 begins execution during clock cycle 3 even though instruction 2 occurs after 
instruction 1 in the instruction stream. 


On clock cycle 4, instruction 0) completes and data is being written back into the general- 
purpose register file while simultaneously being forwarded to the waiting instruction 1. As 
its source data is fed to it, instruction 1 is able to immediately begin execution. 
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Notice that instruction | appears to be in the execute stage of the integer unit for two clock 
cycles. However, this is only because instruction 1 will move into the execute stage during 
clock cycle 3 while it waits for its source data. No execution of instruction | occurs during 
clock cycle 3. 


Floating-point instructions are able to issue out of order with respect to integer instructions 
and flow-control instructions. Integer instructions cannot issue out of order with respect to 
any other instructions. Flow-control instructions (issued to the BPU) are allowed to issue 
out of order with respect to both integer instructions and floating-point instructions, but not 
with respect to other flow-control instructions. 


7.2.6 Writeback Timing 


There are two writeback buses available for each register file on the MPC601. It is possible 
for more than one instruction to write to the same register file in a given clock cycle. For 
example, a load into the general register file may write its results during the same clock as 
a single-cycle integer instruction. Both of these instructions will require a separate 
writeback bus into the general register file. 


Each of the execution units independently handles data that is being written back. In the 
integer unit, if a subsequent integer instruction is waiting for data, it is forwarded past the 
register file directly into the appropriate execution unit for the immediate execution of the 
waiting instruction. 


In the floating-point unit, if a subsequent floating-point instruction is waiting for data, that 
instruction must wait for the data to be written into the floating-point register file. On the 
next clock cycle, the following instruction may begin decode. In other words, the FPU is 
not equipped with a feed-forwarding mechanism. The exception to this point is when a 
floating-point instruction is waiting for data from a floating-point load operation. In this 
case, the floating-point operation may begin decode during the same clock cycle as the 
floating-point register file is being updated. 


7.3 Execution Unit Timings 


The following sections describe instruction timing within each of the respective execution 
units in the MPC6()1. All timings described are only accurate to within a half-clock cycle. 


7.3.1 Branch Processing Unit Execution Timing 


Flow-control operations (conditional branches, unconditional branches, and traps) are 
typically expensive to execute in most machines because they disrupt normal flow in the 
instruction stream. When a change in program flow occurs, the IQ must be reloaded with 
the target instruction stream. During this time, bubbles can be introduced into the execution 
units. However, previously issued instructions will continue to execute while the new 
instruction stream makes its way into the IQ. 
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Performance features such as branch folding and static branch prediction help minimize the 
penalties associated with flow-control operations on the MPC601. The timing for branch 
instruction execution is determined by many factors including the following: 


¢ Whether the branch is taken 

¢ Whether the target instruction stream is in the on-chip cache 
¢ Whether the branch can be predicted 

¢ Whether the prediction is correct 


7.3.1.1 Branch Folding 


When a branch instruction is encountered in the bottom four elements of the IQ, the 
MPC6()1 immediately tries to pull that instruction out of the IQ and resolve it. When the 
branch processing unit pulls the branch instruction out of the IQ, the instruction above the 
branch is shifted down to take the place of the removed branch. The technique of removing 
the branch instruction from the instruction sequence seen by the other execution units, is 
known as branch folding. 


Often, branch folding reduces the penalties of flow-control instructions to zero since 
instruction execution proceeds as though the branch was never there. 


If the folded branch instruction changes program flow (the branch is said to be “taken” in 
this case), the BPU immediately requests the instructions at the new target from the on-chip 
cache. In most cases, the new instructions arrive in the IQ before any bubbles are 
introduced into the execution units. If the folded branch does not change program flow (the 
branch is said to be “not taken” in this case), the branch is already removed from the 
instruction stream and execution continues as if there were never a branch in the original 
sequence. 


7.3.1.2 Static Branch Prediction 


Static (compiler-directed) branch prediction is a mechanism by which software (for 
example, compilers) can give a hint to the machine hardware about the direction the branch 
is likely to take. When a branch instruction encounters a data dependency, the BPU waits 
for the required condition code to become available. Rather than stalling instruction issue 
until the source operand is ready, the MPC601 predicts which path the branch instruction 
is likely to take, and instructions are fetched and executed along that path. When the branch 
operand becomes available, the branch is evaluated. If the predicted path was correct, 
program flow continues along that path uninterrupted; otherwise, the processor backs up, 
and program flow resumes along the correct path. 


There is a scenario where a flow-control instruction will not be predicted on the MPC601. 
If the target address of the branch (link register) will be modified by an instruction that 
appears before the branch instruction, the branch processing unit must wait until the target 
address is available. 
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The MPC601 may only execute in one level of prediction. In other words, the 
microprocessor may not predict a branch if a prior branch instruction is still unresolved. 
Additionally, while executing down a predicted path, no data or instruction accesses are 
allowed to execute if the access must go off-chip. 


The number of instructions that can be executed conditionally after the issue of a predicted 
branch instruction is limited by the fact that no conditionally executed instruction may 
actually update the register files or memory. That is, instructions may be issued and 
executed conditionally, but may not reach the writeback stage of their pipelines. When a 
conditionally issued instruction has completed execution, it will not be moved into the 
writeback stage, instead, it will simply stall in the last execute phase of that execution unit. 
This means that the execution units may become full, which will limit the number of 
additional instructions that may be issued conditionally. 


In the case of a misprediction, the MPC601 is able to reverse its machine state rather 
painlessly because the programing model has not been updated. When a branch is found 
out to be mispredicted, all instructions that were issued conditionally are simply flushed 
from the execution unit pipelines. No register state needs to be restored because no register 
state was modified conditionally. 


7.3.1.2.1 Predicted “Not Taken” Branch Timing Examples 


Figure 7-6 depicts the case where branch instructions are predicted to be not taken. During 
clock cycle 0, eight instructions arrive into the IQ. During clock cycle 1, instructions 0, 1, 
and 2 are pulled from the IQ and into their respective execution units. Notice that the BPU 
has a combined decode/execute stage, thus the branch (instruction 1) is predicted not to be 
taken during clock cycle 1 because its source register (condition register) is not available. 
The branch is predicted because its source data is not yet available. 


During clock cycle 2, instructions 0 and 2 progress through their pipelines. In addition, the 
branch (instruction 1) remains predicted. Notice that the next branch instruction 
(instruction 5) is not able to begin its decode/execute phase while instruction 1 is predicted. 


During clock cycle 3, instruction 0) begins its writeback stage. The writeback of instruction 
() resolves the data dependency for the first branch (instruction 1); thus the first branch 
becomes resolved and it is determined that the prediction was correct. Recall that only one 
branch may be predicted at a time; thus when instruction 1 is resolved the BPU is free to 
predict instruction 5. 


During clock 4, the second branch instruction remains predicted while additional 
instructions move through the various pipelines. 


During clock cycle 5, the BPU realizes that the prediction made for instruction 5 was 
incorrect. Note that since instruction 6 was issued and executed conditionally, it never 
performed its writeback. As a result of the misprediction, all instructions that followed the 
branch in the instruction stream must be flushed from the respective execution unit 
pipelines. Notice that instructions 6 and 7 do not continue execution since it has been 
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determined that these instructions should have never been issued in the first place. Since 
the branch has been resolved, a request is sent to the on-chip cache for the new instruction 
stream (based on the execution of instruction 5). During clock 6, the new set of instructions 
are in the IQ and the appropriate decoding begins on clock cycle 7. 
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Figure 7-6. Instruction Timing—Branch Not Taken 


7.3.1.2.2 Predicted “Taken” Branch Timing Examples 

Figure 7-7 depicts the case where branch instructions are predicted to be taken. During 
clock cycle 0, eight instructions are fed into the IQ. During clock cycle 1, the first branch 
(instruction 1) is decoded and executed (recall that the branch execution unit has a 
combined decode/execute stage). Note that as the branch is predicted during clock cycle 1, 
a request is sent to the on-chip cache for the new instruction stream. Also during clock cycle 
1, all subsequent instructions are not processed any further. Notice that these instructions 
are not discarded, they are simply not processed. 
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Figure 7-7. Instruction Timing—Branch Taken 


During clock cycle 2, instruction 0 is executed and the branch (instruction 1) is resolved. It 
turns out that this branch was predicted incorrectly. As a result, the instructions that are 
being received from the on-chip cache are discarded. Additionally, processing begins again 
on the subsequent instructions in the IQ. 


It is important to note that the MPC601 does not discard instructions 2—7 in this example. 
The processor does not discard instructions until the new instructions have been received 
from the on-chip cache. This helps in the case of mispredictions (as shown here). 


During clock cycle 3, the next branch (instruction 5) has fallen into one of the bottom 4 
positions in the IQ, and thus, can be pulled out of the IQ into the BPU. The branch may not 
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be resolved immediately, and is predicted to be taken. As a result, processing stops on all 
subsequent instructions while processing continues on all previous instructions 
(instructions 2, 3, and 4). Additionally, a request is sent to the on-chip cache for the new 
instruction stream. 


During clock cycle 4, the new instructions arrive in the IQ, which forces instructions 6 and 
7 to be discarded. During clock 5, the second branch (instruction 5) is resolved and it is 
determined that the prediction was correct. As a result, instruction decode and execute 
continues. 


7.3.2 Integer Unit Execution Timing 


The integer unit executes all integer, bit-field, and data access instructions. Many of these 
instructions execute in a single clock cycle. The integer unit has one execute phase in its 
pipeline, thus when a multi-cycle integer instruction is being executed, no other integer 
instructions may begin an execute phase. Although a multi-cycle integer instruction may 
block the integer execute phase, it does not preclude subsequent integer instructions from 
being decoded. In addition, there is a one-entry buffer between the integer decode stage (IQ 
()) and the integer execute stage. If the execute stage of the integer unit is available, then a 
decoded instruction will fall through the buffer into the execute stage. 


The single execute stage in the integer unit is used differently for the single-cycle and 
multi-cycle integer instructions. For example, single-cycle integer instructions move 
through the execute stage in one clock cycle, as do data access instructions, which use the 
execute stage of the integer unit for address calculation. For multi-cycle integer instructions 
(such as, mul), the same execute stage is used over and over until the multi-cycle 
instruction has completed execution. Figure 7-8 illustrates the instruction flow of the 
integer unit. 
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Figure 7-8. Integer Unit Instruction Flow 
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7.3.2.1 Integer Instructions Timing Examples 


Figure 7-9 illustrates the timing of the integer unit while executing a sequence of integer 
instructions. 
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Figure 7-9. Instruction Timing—Integer Instructions 


Notice that each integer instruction takes only one clock cycle to decode. After the decode 
stage, the integer instruction moves into the execute phase of the integer unit pipeline. 
Notice, however, that the mul instruction uses five clock cycles (clocks 4 - 8) in the execute 
phase of the integer unit. As a result, during clock 5, the subsequent integer instruction 
cannot move into the execute phase. Instead it is held in the IU buffer as it passes from the 
decode stage (IQ-Q) until the execute phase of the integer unit becomes available (during 
clock cycle 9). Although instruction 3 may not enter the execute phase (because that phase 
is still being used by instruction 2) on clock cycle 5, instruction 4 may enter the decode 
stage. 


During clock cycle 9, the mul (instruction 2) instruction moves from the execute stage into 
the writeback stage. As a result, instruction 3 is able to move into the execute stage and 
instruction 4 moves into the buffer since it has already been decoded. After the mul 





7-20 PowerPC 601 RISC Microprocessor User's Manual MOTOROLA 


instruction is moved out of the execute phase of the pipeline, the single-cycle pipeline 
continues. 


It is important to note that if the integer unit buffer was not present, the timing of the 
instructions shown, with respect to when their writeback stages occur, would be the same. 
The real value of the buffer in the integer unit is that it allows instructions to be pulled out 
of the IQ even when the execute stage in the integer unit is busy. This in turn allows new 
instructions to enter the critical bottom four entries and possibly be decoded/executed by 
the floating-point unit or the branch processing unit. 


7.3.2.2 Data Instructions Timing Examples 


As previously mentioned, the IU also does the address calculation for all data access 
instructions. Once the address calculation is complete (in one clock cycle), the data access 
instruction is moved to another stage, thus allowing additional integer instructions to be 
pushed through the integer unit. 


Figure 7-10 illustrates the timing of a data access instruction as it moves through the IU. 
During clock cycle 0), the instructions arrive into the IQ. During clock cycle 1, instruction 
() is decoded while the subsequent instructions wait in the IQ. During clock cycle 2, 
instruction () begins its execute phase while instruction 1 is decoded. 


During clock cycle 3, instruction 2 is decoded while instruction 0 is writing results back 
into the general purpose register file. Also during clock cycle 3, instruction 1 is executed. 
During the execution of instruction 1, the effective address is calculated and a request is 
sent to the on-chip cache for the data needed. 


On clock cycle 4, the on-chip cache is being accessed and data is being returned. While this 
access occurs, the execute phase of the IU becomes available to service instruction 2. Note 
that even if a cache miss occurred for this data access instruction, the execute phase of the 
IU would become available after only one cycle was used to calculate the effective address. 


As load and store operations move from the integer unit execution stage, they move to the 
cache. If a cache miss occurs, they are inserted in a buffer. There is a separate write buffer 
and read buffer. If a cache miss results from a load operation, that load instruction will 
remain in the read buffer until data is returned from memory and the cache and register file 
are updated. While this load is pending, additional load operations may access the cache. 
However, if a load access misses the cache while the load buffer is still holding a previously 
issued load, then the integer unit may experience a stall. If a cache miss results from a store 
operation, that store instruction will remain in the write buffer until the data has been 
properly stored. While this store is pending, additional store operations may access the 
cache, and may aiso be piaced in the store buffer behind the original store if a cache miss 
occurs. 
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Figure 7-10. Instruction Timing—Data Instructions 


Once the load buffer becomes full, additional load operations which miss the cache, may 
have to wait in the integer unit execution stage for a read buffer entry to become available. 
In addition, once the three-entry write buffer becomes full, additional store operations 
which miss the cache, may have to wait in the integer unit execution stage for a write buffer 
entry to become available. 


For timing information on cache misses, refer to Section 7.2.3.3, “Cache Miss,’ which 
describes instruction fetches that miss in the on-chip cache. 


7.3.3 Floating-Point Unit Execution Timing 


The floating-point unit on the MPC601 executes all floating-point instructions with the 
exception of the floating-point load and store operations. For these instructions, it is the 
integer unit that does the effective address calculation, but a single clock cycle is needed 
from the FPU in order to move the data in or out of the floating-point register file. The 
timing of floating-point load and store instructions with respect to the FPU is discussed 
further in the following paragraphs. 
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The FPU has only two execute phases in its pipeline. There are some floating-point 
operations that need more than two clock cycles of execute time. In the case where the FPU 
must execute these longer latency instructions, both execute phases and the decode phase 
may be used for multiple clock cycles. Figure 7-11 illustrates the pipeline of the floating- 
point unit. Notice the secondary path out of the first and second execute phases that allow 
repeated use of these stages by the same instruction. 
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Figure 7-11. Floating-Point Unit Instruction Flow 


In fact, all double-precision multiply instructions will spend multiple clock cycles in the 
execute phases of the FPU. When a double-precision multiply instruction is encountered, 
it will spend a minimum of two clock cycles in each phase of the FPU pipeline. That is, it 
will spend two clock cycles in the decode phase, a minimum of two clock cycles in FPU 
execute 1, and a minimum of two clock cycles in FPU execute 2. 


When a double-precision floating-point multiply instruction enters the FPU pipeline, 
previous floating-point pipeline stages (except for the FPU buffer) become unavailable. For 
example, when a double-precision multiply instruction moves from the FPU decode stage 
into the FPU execute stages, no other FPU instruction may enter the FPU decode stage until 
the multiply has moved out of the FPU execute stages into the writeback stage. 


7.3.3.1 Floating-Point Instructions Timing Exampies 


The following paragraphs describe a sequence of the floating-point instructions as they pass 
through the various stages of the FPU. 
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Figure 7-12 illustrates an example of the sequence of floating-point instructions. In clock 
(), the eight instructions are fed into the IQ. During clock cycle 1 instruction 0 is pulled from 
the IQ into the floating-point buffer. 
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Figure 7-12. Instruction Timing—Floating-Point Instructions 


During clock 2, instruction 0 is sent into the floating-point decode stage, which makes room 
for instruction 1 to be fed into the floating-point buffer. Instructions () and 1 continue to flow 
through the floating-point pipeline until they complete execution and write back their 
results into the floating-point register file. 


Instruction 2 has a data dependency on instruction 1. In other words, one of the operands 
of instruction 2 is the result of instruction 1. For this reason, instruction 2 may not proceed 
through the floating-point pipeline right behind instruction 1. Instruction 2 is held in the 
floating-point decode stage during clocks 4—7 while instruction 1 completes execution. It 
is during clock cycle 6 that instruction 1 completes execution and updates the floating-point 
register file with its results. However, since there is no feed forwarding mechanism in the 
floating-point unit on the MPC601, instruction 2 must wait until clock 7 before it may begin 
its decode stage. Notice that as instruction 2 is held in the decode stage, instruction 3 is 
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allowed to move into the floating-point buffer. This allows the IQ to be shifted down and 
possibly reveal a branch instruction within the critical bottom 4 elements of the IQ. 


During clock cycle 8, instruction 2 begins its execute phase, instruction 3 begins its decode 
stage, and instruction 4 enters the floating-point buffer. The following clock cycles show 
that the subsequent instructions flow through the FPU in a fully pipelined manner. Although 
it is not shown here, the fdiv operation (instruction 6) will tie up the execute phases of the 
FPU for 14 clock cycles. During this time, instruction 7 will be waiting in the floating-point 
decode stage until the execute phase becomes available. 


7.4 Memory Performance Considerations 


When instruction throughput is capable of three instructions per clock cycle, lack of data 
bandwidth can become a performance bottleneck. In order for the MPC6()1 to approach its 
potential performance levels, it must be able to read and write data quickly and efficiently. 
If there are many processors in a system environment, one processor may experience long 
memory latencies while another bus master (for example, another processor or a direct 
memory access controller) is using the external bus. 


In order to alleviate this possible contention, the MPC601 provides three memory update 
modes: copy-back, write-through, and cache-inhibit. Each page of memory is specified to 
be in one of these modes. If a page is in copy-back mode, data being stored to that page is 
written only to the on-chip cache. If a page is in write-through mode, writes to that page 
update the on-chip cache on hits and always update main memory. If a page is cache- 
inhibited, data in that page will never be stored in the on-chip cache. All three of these 
modes of operation have advantages and disadvantages. A decision as to which mode to use 
depends on the system environment as well as the application. 


This section describes how performance is impacted by each memory update mode. For 
details about the operation of the on-chip cache and the memory update modes, see 
Chapter 4, “Cache and Memory Unit Operation.” 


7.4.1 Copy-Back Mode 


When storing data while in copy-back mode, store operations for cacheable data do not 
necessarily cause an external bus cycle to update memory. Instead, memory updates only 
occur on line replacements, cache flushes, or when another processor attempts to access a 
specitic address for which there is a corresponding dirty cache entry. For this reason, copy- 
back mode may be preferred when external bus bandwidth is a potential bottleneck—for 
example, in a multiprocessor environment. Copy-back mode is also well suited for data that 
is closely coupled to a processor, such as local variables. 


If more than one processor uses data stored in a page that is in copy-back mode, snooping 
must be enabled to allow copy-back operations and cache invalidations of modified data. 
The MPC601 implements snooping hardware to prevent other devices from accessing 
invalid data. When bus snooping is enabled, the processor monitors the transactions of the 
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other devices. For example, if another device accesses a memory location, the MPC601 on- 
chip cache has a modified value for that address, and the memory-coherent (M) bit 
corresponding to that page is set, the processor pre-empts the bus transaction, and updates 
memory with the cache data. The other device is then free to attempt an access to the 
updated memory address. See Chapter 4, “Cache and Memory Unit Operation,” for 
complete information on bus snooping. 


Copy-back mode provides complete cache/memory coherency as well as maximizing 
available external bus bandwidth. 


7.4.2 Write-Through Mode 


Store operations to memory in write-through mode always update memory as well as the 
on-chip cache (on cache hits). Write-through mode is used when the data in the cache must 
always agree with external memory (for example, video memory), or when there is shared 
(global) data that may be used frequently, or when allocation of a cache line on a cache miss 
is undesirable. Automatic copy back of cached data is not performed if that data is from a 
memory page marked as write-through mode since valid cache data always agrees with 
memory. 


Stores to memory that is in write-through mode may cause a decrease in performance. Each 
time a store is performed to memory in write-through mode, the bus will be busy for the 
extra clock cycles required to perform the memory update; therefore, pending load 
operations that miss the on-chip cache must wait while the external store operation 
completes. In addition, since the on-chip cache is shared for both instructions and data, any 
pending instruction fetches from the on-chip cache may also see an undesired latency. 


7.4.3 Cache-Inhibited Accesses 


If a memory page is specified to be cache-inhibited, data from this page will not be stored 
in the on-chip cache. 


Areas of the memory map can be cache-inhibited by the operating system software. If a 
cache-inhibited access hits in the on-chip cache, the corresponding cache line is 
invalidated. If the line is marked as modified, it is copied back to memory before being 
invalidated. 


In summary, the copy-back mode allows both load and store operations to use the on-chip 
cache. The write-through mode allows load operations to use the on-chip cache, but store 
operations cause a memory access and a cache update if the data is already in the cache. 
Lastly, the cache-inhibited mode causes memory access for both loads and stores. 


7.5 Instruction Latency Summary 


Table 7-1 lists the latencies associated with each instruction executed by the MPC601. Note 
that Table 7-1 contains no 64-bit architected instructions. These instructions will trap to an 
illegal instruction exception handler when encountered. Recall that the term latency is 
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defined as the total time it takes to execute an instruction and make ready the results of that 
instruction. 


As previously stated, the FPU has no feed-forwarding capabilities. In other words, as a 
floating-point operation completes, another floating-point instruction that may be waiting 
for those results must wait for the data to be written into the register file before decode can 


begin. This extra time is accounted for in Table 7-1. 
Table 7-1. MPC601 Instruction Latencies 
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Table 7-1. MPC601 Instruction Latencies (Continued) 
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Table 7-1. MPC601 Instruction Latencies (Continued) 
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Table 7-1. MPC601 Instruction Latencies (Continued) 


Latency 
(Clocks) 
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Table 7-1. MPC601 Instruction Latencies (Continued) 
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(Clocks) 
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Table 7-1. MPC601 Instruction Latencies (Continued) 


Latency 
(Clocks) 
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Table 7-1. MPC601 Instruction Latencies (Continued) 
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Table 7-1. MPC601 Instruction Latencies (Continued) 


Latency 
(Clocks) 
|subfic | Subtract from Immediate Carrying 
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'These instructions access the system bus, thus the latency may vary depending on the exact state 
of the machine. 





@These instructions serialize the processor if the trap is taken. 


3The longer latency may occur if the contents of rB is larger than 16 bits (not including sign-extend- 
ing bits 


‘Shortest latency occurs if rB <= 16 bits. Longer latency occurs if rB > 16 bits, but most significant 
bit is still 0. Longest latency occurs if most significant bit is 1. 
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Chapter 8 
Signal Descriptions 


This chapter describes the MPC601 microprocessor's external signals. It contains a concise 
description of individual signals, showing behavior when the signal is asserted and negated 
and when the signal is an input and an output. 


NOTE 


A bar over a signal name indicates that the signal is active 
low—for example, ARTRY (address retry) and TS (transfer 
start). Active-low signals are referred to as asserted (active) 
when they are low and negated when they are high. Signals that 
are not active-low, such as APQ-AP3 (address bus parity 
signals) and TTO-TT4 (transfer type signals) are referred to as 
asserted when they are high and negated when they are low. 


The MPC601 signals are grouped as follows: 


Address arbitration signals—The MPC601 uses these signals to arbitrate for address 
bus mastership. 


Address transfer start signals—These signals indicate that a bus master has begun a 
transaction on the address bus. 


Address transfer signals—These signals, which consist of the address bus, address 
parity, and address parity error signals, are used to transfer the address and to ensure 
the integrity of the transfer. 


Transfer attribute signals—These signals provide information about the type of 
transfer, such as the transfer size and whether the transaction is bursted, write- 
through, or cache-inhibited. 


Address transfer termination signals—These signals are used to acknowledge the 
end of the address phase of the transaction. They also indicate whether a condition 
exists that requires the address phase to be repeated. 


Data arbitration signals—The MPC601 uses these signals to arbitrate for data bus 
mastership. 


Data transfer signals—These signals, which consist of the data bus, data parity, and 
data parity error signals, are used to transfer the data and to ensure the integrity of 
the transfer. 
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¢ Data transfer termination signals—Data termination signals are required after each 
data beat in a data transfer. In a single-beat transaction, the data termination signals 
also indicate the end of the tenure, while in burst accesses, the data termination 
signals apply to individual beats and indicate the end of the tenure only after the final 
data beat. They also indicate whether a condition exists that requires the data phase 
to be repeated. 


¢ System status signals—These signals include the external interrupt signal, 
checkstop signals, and both soft- and hard-reset signals. These signals are used to 
interrupt and, under various conditions, to reset the processor. 


¢ Processor state signals—These two signals are used to set the reservation coherency 
bit and set the size of the MPC601’s output buffers. 


¢ Miscellaneous signals—These signals provide information about the state of the 
reservation coherency bit and the size of the MPC6()1's output buffers. 


¢ COP interface signalsk—The common on-chip processor (COP) unit is the master 
clock control unit and it provides a serial interface to the system for performing 
built-in self test (BIST). 


¢ Test interface signals—These signals are used for internal testing. 


¢ Clock signals—These signals determine the system clock frequency. These signals 
8 | can also be used to synchronize multiprocessor systems. 


8.1 Signal Configuration 


Figure 8-1 illustrates the MPC601 microprocessor's pin configuration, showing how the 
signals are grouped. 
NOTE 


A pinout showing actual pin numbers is included in the 
MPC6()1 microprocessor electrical specifications. 
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Figure 8-1. MPC601 Signal Groups 


8.2 Signal Descriptions 


This section describes individual MPC601 signals, grouped according to Figure 8-1. 
Note that the following sections are intended to provide a quick summary of signal 
functions. Chapter 9, “System Interface Operation,” describes many of these signals in 
greater detail, both with respect to how individual signals function and how groups of 
signals interact. 


8.2.1 Address Bus Arbitration Signal 


The address arbitration signals are a collection of input and output signals the MPC6()1 uses 
to request the address bus, recognize when the request is granted, and indicate to other 
devices when mastership is granted. For a detailed description of how these signals interact, 
see Section 9.3.1, “Address Bus Arbitration.” 
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8.2.1.1 Bus Request (BR)—Output 


The bus request (BR) signal is an output signal on the MPC601. Following are the state 
meaning and timing comments for the BR signal. 


State Meaning Asserted—Indicates that the MPC601 is requesting mastership of 
the address bus. See Section 9.3.1, “Address Bus Arbitration.” 


Negated—Indicates that the MPC6()1 is not requesting the address 
bus. The MPC601 may have no bus operation pending, it may be 
parked, or the ARTRY input was asserted on the previous bus clock 
cycle. 


Timing Comments Assertion—Occurs when the MPC601 is not parked and a bus 
transaction is needed. This may occur even if the two possible 
pipeline accesses have occurred. 


Negation—Occurs for at least one bus clock cycle after an accepted, 
qualified bus grant (see BG and ABB), even if another transaction is 
pending. It is also negated for at least one bus clock cycle when the 
assertion of ARTRY is detected on the bus. 


8.2.1.2 Bus Grant (BG)—Input 


The bus grant (BG) signal is an input signal on the MPC601. Following are the state 
meaning and timing comments for the BG signal. 


State Meaning Asserted—Indicates that the MPC601 may, with the proper 
qualification, assume mastership of the address bus. A qualified bus 
grant occurs when BG is asserted and ABB and ARTRY are not 
asserted. The ABB signal is driven by the MPC601 or another bus 
master, but ARTRY is driven only by the bus. If the MPC601 is 
parked, BR need not be asserted for the qualified bus grant. See 
Section 9.3.1, “Address Bus Arbitration.” 


Negated— Indicates that the MPC6()1 is not the next pee: 
address bus master. 


Timing Comments Assertion—May occur at any time to indicate the MPC601 is free to 
use the address bus. After the MPC601 assumes bus mastership, it 
does not check for a qualified bus grant again until the cycle during 
which the address bus tenure is completed (assuming it has another 
transaction to run). The MPC601 does not accept a BG in the cycles 
between the assertion of any TS or XATS and AACK. 


Negation—May occur at any time to indicate the MPC601 cannot 
use the bus. The MPC601 may still assume bus mastership on the bus 
clock cycle of the negation of BG because during the previous cycle 
BG indicated to the MPC601 that it was free to take mastership (if 
qualified). 
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8.2.1.3 Address Bus Busy (ABB) 
The address bus busy (ABB) signal is both an input and an output signal. 


8.2.1.3.1 Address Bus Busy (ABB)—Output 
Following are the state meaning and timing comments for the ABB output signal. 


State Meaning Asserted—Indicates that the MPC601 is the address bus master. See 
Section 9.3.1, “Address Bus Arbitration.” 


Negated— Indicates that the MPC601 is not using the address bus. 
If ABB is negated during the bus clock cycle following a qualified 
bus grant, the MPC601 did not accept mastership, even if BR was 
asserted. This can occur if a potential transaction is aborted 
internally before the transaction is started. 


Timing Comments Assertion—Occurs on the bus clock cycle following a qualified BG 
that is accepted by the processor (see Negated). 


Negation—Occurs on the bus clock cycle following the assertion of 
AACK. If ABB is negated during the bus clock cycle following a 
qualified bus grant, the MPC601 did not accept mastership, even if 
BR was asserted. 


High Impedance—Occurs one-half processor clock cycle after ABB 
is negated. 

8.2.1.3.2 Address Bus Busy (ABB)—Input 

Following are the state meaning and timing comments for the ABB input signal. 

State Meaning Asserted—Indicates that the address bus is in use. This condition 
effectively blocks the MPC601 from assuming address bus 


ownership, regardless of the BG input. Optional. (See Section 9.3.1, 
“Address Bus Arbitration.”) 


Negated—Indicates that the address bus is not owned by another bus 
master and that it is available to the MPC601 when accompanied by 
a qualified bus grant. 


Timing Comments Assertion—May occur when the MPC601 must be prevented from 
using the address bus (and the processor is not currently asserting 


Negation—iviay occur whenever the MPC601 can use the address 
bus. 


Note that this signal is logically ORed with an internally generated address bus busy signal. 
For more information, see Section 9.3.1, “Address Bus Arbitration,” for more information. 
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8.2.2 Address Transfer Start Signals 


Address transfer start signals are input and output signals that indicate that an address bus 
transfer has begun. The transfer start (TS) signal identifies the operation as a memory 
transaction; extended address transfer start CXATS) identifies the transaction as an I/O 
controller interface operation. 


For detailed information about how TS and XATS interact with other signals, refer to 
Section 9.3.2, “Address Transfer,” and Section 9.6, “I/O Controller Interface Operation,” 
respectively. 


8.2.2.1 Transfer Start (TS) 
The transfer start (TS) signal is both an input and an output signal on the MPC601. 


8.2.2.1.1 Transfer Start (TS)—Output 
Following are the state meaning and timing comments for the TS output signal. 


State Meaning Asserted—Indicates that the MPC601 has begun a memory bus 
transaction and that the address-bus and transfer-attribute signals are 
valid. It is also an implied data bus request for a memory transaction 
(unless it is an address-only operation.) 


Negated—Is negated during an I/O controller interface operation. 


Timing Comments Assertion—Coincides with the assertion of ABB. 
Negation—Occurs one bus clock cycle after TS is asserted. 
High Impedance—Coincides with the negation of ABB. 


8.2.2.1.2 Transfer Start (TS)—Input 
Following are the state meaning and timing comments for the TS input signal. 


State Meaning Asserted—Indicates that another master has begun a bus transaction 
and that the address bus and transfer attribute signals are valid for 
snooping (see GBL). 

Negated—Has no meaning. 

Timing Comments Assertion—May occur during the assertion of ABB. 

Negation—Must occur one bus clock cycle after TS is asserted. 


8.2.2.2 Extended Address Transfer Start (XATS) 


The extended address transfer start (XATS) signal is both an input and an output signal on 
the MPC601. 
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8.2.2.2.1 Extended Address Transfer Start (KATS)—Output 
Following are the state meaning and timing comments for the XATS output signal. 


State Meaning Asserted—Indicates that the MPC601 has begun an I/O controller 
interface operation and that the first address cycle is valid. It is also 
an implied data bus request for certain I/O controller interface 
operation (unless it is an address-only operation.) 


Negated—Is negated during an entire memory transaction. 


Timing Comments Assertion—Coincides with ABB. 
Negation—Occurs one bus clock cycle after the assertion of XATS. 


High Impedance—Coincides with the negation of ABB. 
8.2.2.2.2 Extended Address Transfer Start (XATS)—Input 
Following are the state meaning and timing comments for the XATS input signal. 


State Meaning Asserted—Indicates that the MPC601 must check for an I/O 
controller interface operation reply operation with a receiver tag that 
matches bits 28-31 of the MPC601 PID register. 


Negated—Indicates that there is no need to check for an I/O 
controller interface operation reply. 


Timing Comments Assertion—May occur while ABB is asserted. 
Negation—Must occur one bus clock cycle after XATS is asserted. 


8.2.3 Address Transfer Signals 


The address transfer signals are used to transmit the address and to generate and monitor 
parity for the address transfer. For a detailed description of how these signals interact, refer 
to Section 9.3.2, “Address Transfer.” 


8.2.3.1 Address Bus(A0-A31) 
The address bus (AQ—A31) consists of 32 signals that are both input and output signals. 


8.2.3.1.1 Address Bus (A0—-A31)—Output 

Following are the state meaning and timing comments for the AO-A31 output signals. 

State Meaning Asserted/Negated—Represents the physical address of the data to be 
transferred. On burst transfers, the address bus presents the quad- 


word—aligned address containing the critical code/data that missed 
the cache. See Section 9.3.2, “Address Transfer.” 


Timing Comments Assertion/Negation—Occurs on the bus clock cycle after a qualified 
bus grant (coincides with assertion of ABB and TS.) 


High Impedance—Occurs one bus clock cycle after AACK is 
asserted. 


MOTOROLA Chapter 8. Signal Descriptions 8-7 


8.2.3.1.2 Address Bus (A0-A31)—Input 
Following are the state meaning and timing comments for the AO—A31 input signals. 


State Meaning Asserted/Negated—Represents the physical address of a snoop 
operation. 


Timing Comments Assertion/Negation—Must occur on the same bus clock cycle as the 
assertion of TS. 


8.2.3.1.3 Address Bus (A0-A31)—Output (I/O Controller Interface 
Operations) 


Following are the state meaning and timing comments for the address bus signals (AQ- 
A31) for output I/O controller interface operations on the MPC601. 


State Meaning Asserted/Negated—For I/O controller interface operations where 
the MPC601 is the master, the address tenure consists of two packets 
(each requiring a bus cycle). For packet 0), these signals convey 
control and tag information. For packet 1, these signals represent the 
physical address of the data to be transferred. 


Timing Comments Assertion/Negation—Address tenure consists of two beats. The first 
beat occurs on the bus clock cycle after a qualified bus grant, 
coinciding with XATS. The address bus transitions to the second 
beat on the next bus clock cycle. 


High Impedance—Occurs on the bus clock cycle after AACK is 
asserted. 


8.2.3.1.4 Address Bus (A0-—A31)—Input (I/O Controller Interface 
Operations) 

Following are state meaning and timing comments for input I/O controller interface 
operations on the MPC601. 


State Meaning Asserted/Negated—When the MPC601 is not the master, it snoops 
(and checks address parity) on the first address beat only of all I/O 
controller interface operations for an I/O reply operation with a 
receiver tag that matches its PID tag. See Section 9.6, “I/O 
Controller Interface Operation.” 


Timing Comments Assertion/Negation—The MPC60)1 looks for only the first beat of 
the I/O transfer address tenure, which coincides with XATS. The 
second address bus beat is not required by the MPC6()1. 


8.2.3.2 Address Bus Parity (APO-—AP3) 


The address bus parity (APO-AP3) signal is both an input and output signal that has four 
pin locations on the MPC601. 
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8.2.3.2.1 Address Bus Parity (APO—AP3)—Output 
Following are the state meaning and timing comments for the APO—-AP3 output signal on 
the MPC601. 


State Meaning Asserted/Negated—Represents odd parity for each of four bytes of 
the physical address for a transaction. By odd parity, an odd number 
of bits, including the parity bit, are driven high. The signal 
assignments correspond to the following: 

APO. = AQ-A7 

AP1 A8—-A15 

AP2 A16-A23 

AP3 A24-A31 

For more information, see Section 9.3.2.1, “Address Bus Parity.” 


Timing Comments Assertion/Negation—The same as AQ-A31. 
High Impedance—The same as AOQ-A31. 


8.2.3.2.2 Address Bus Parity (APO—AP3)—Input 
Following are the state meaning and timing comments for the APO—-AP3 input signal on the 
MPC601. 


State Meaning Asserted/Negated—Represents odd parity for each of four bytes of 
the physical address for snooping and I/O controller interface 
Operations. Detected even parity causes the processor to enter the 
checkstop state if address parity checking is enabled in the HIDO 
register (see Section 2,.3.3.12.1, “Checkstop Sources and Enables 
Register—HID0). (See also the APE signal description below). 


Timing Comments Assertion/Negation—The same as AO-A31. 


8.2.3.3 Address Parity Error (APE)—Output 

The address parity error (APE) signal is an output signal on the MPC601. Following are the 
state meaning and timing comments for the APE signal on the MPC601. For more 
information, see Section 9.3.2.1, “Address Bus Parity.” 


State Meaning Asserted—Indicates incorrect address bus parity has been detected 
by the MPC601 on a snoop (GBL asserted). This includes the first 
address beat of an I/O controller interface operation. 


Negated—Indicates that the MPC601 has not detected a parity error 
(even parity) on the address bus. 


Timing Coinimeiits Assertion—Cccurs on the second bus clock cycle after TS or KATS 
is asserted. 


Negation—Occurs on the third bus clock cycle after TS or XATS is 
asserted. 
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8.2.4 Address Transfer Attribute Signals 


The transfer attribute signals are a set of signals that further characterize the transfer—such 
as the size of the transfer, whether it is a read or write operation, and whether it is a burst 
or single-beat transfer. For a detailed description of how these signals interact, see 
Section 9.3.2, “Address Transfer.” 


Note that some signal functions vary depending on whether the transaction is a memory 
access or an I/O access. For a description of how these signals function for I/O controller 
interface operations, see Section 9.6, “I/O Controller Interface Operation.” 


8.2.4.1 Transfer Type (TTO-TT4) 


The transfer type (TTO-TT4) signals consist of four input/output signals and one output- 
only signal on the MPC601. For a complete description of TTO-TT4 signals, see Table 8-1 
and for transfer type encodings, see Table 8-2. 


8.2.4.1.1 Transfer Type (TTO-TT4)—Output 
Following are the state meaning and timing comments for the TTO-TT4 output signals on 
the MPC601. 


State Meaning Asserted/Negated—Indicates the type of transfer in progress. These 
bits roughly correspond to the following decoded operations: 
eAtomic 
*Read/write 
*Invalidate 
*Memory cycle 
For I/O controller interface operations these signals are part of the 
1/O transfer code along with TSIZ and TBST. For I/O controller 
interface operations these signals are part of the extended transfer 
code along with TSIZ and TBST: 


XATC(0:7)=TT(0:3)ITBSTIITSIZ(0:2). 


TT4 is driven negated as an output on the MPC6()1 and is defined for 
future expansion. 


Timing Comments Assertion/Negation/High Impedance—The same as AO-A31. 
8.2.4.1.2 Transfer Type (TTO-—TT3)—Input 


Following are the state meaning and timing comments for the TTO-TT3 input signals on 
the MPC601. 


State Meaning Asserted/Negated—Indicates the type of transfer in progress (see 
Table 8-2). For I/O controller interface operations these signals form 
part of the extended address transfer code (XATC) and are snooped 
by the MPC601 if XATS is asserted. 


Timing Comments Assertion/Negation—The same as AO-A31. 
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Table 8-1. TTO-TT4 Signal Description 


Special operations: The MPC601 drives this signal to indicate that the access is part of an atomic data 
access sequence. This signal is asserted whenever a bus transaction is run in response to a 
Iwarx/stwcx instruction pair, a tlbi operation, or either an eciwx or ecowx instruction. 


Read (/write) operations: This signal indicates whether the transaction is a read (TT1 high) or a write 


(TT1 low). 


invalidate operations: When asserted with GBL, the TT2 output signal indicates that all other caches in 


the system should invalidate the cache entry on a snoop hit. If the snoop hit is to a modified entry, the 
sector should be copied back before being invalidated. 


Address-only operations: This signal, when asserted, indicates that the data transfer is to/from memory. 
External logic can synthesize a data bus request from the combined assertions of TS (or XATS) and TT3. 
lf TT3 is not asserted with the address, the associated bus transaction is considered to be a broadcast 
operation that all potential bus masters must honor (or a reserved operation), except for the external 
control functions (eciwx and ecowx) which require both address and data tenures. 


Reserved. Always negated (low state). (For expandability) 





Table 8-2 describes the encodings for TTO-TT3. 
























Table 8-2. Transfer Type Encodings 
[rer [ra [ra] openin —[eitensnin [commen 
1 Flush sector Address only Due to cache contro} 
operation | 
(replacement sector copy 
backs and snoop push 
1 sync Address only Due to cache control 
operation | 
a 
1 External contro! out Single-beat write Caused by ecowx 2 
CC 


Clean sector Address only Due to cache control 
operation’ 
1 Write with kill Cache sector writes 
operations) 
1 1 Kill sector Address only Store hit on shared sector 
or cache control operation ' 
EES EES Write with flush atomic Single-beat write Caused by stwex 





MOTOROLA Chapter 8. Signal Descriptions 8-11 


Table 8-2. Transfer Type Encodings (Continued) 


[ref [re[ re | ewaion [sete [comme 
1 1 Read atomic Single-beat read or burst Caused by Iwarx 
instruction 


1 External control in Single-beat read Caused by ecowx . 


1 1 Read with intent to modify Caused by stwex 
atomic instruction 


'Cache control operations resulting from explicit cache control instructions (for example, dclf, sync, delz, 
dcli). 









2The signal encodings for these operations do not use the TTO and TTS signals in the manner described in 
Table 8-1. 
Note that TT4 is reserved. 
8.2.4.2 Transfer Size (TSIZO-TSIZ2) 
The transfer size (TSIZO-TSIZ2) signals consist of three input/output signals on the 
MPC601. 
8.2.4.2.1 Transfer Size (TSIZO—-TSIZ2)—Output 
Following are state meaning and timing comments for the TSIZO-TSIZ2 output signals on 


~ the MPC601. 


State Meaning Asserted/Negated—For memory accesses, these signals along with 
TBST, indicate the data transfer size for the current bus operation, as 
shown in Table 8-3. Table 9-2 shows how the TSIZ signals are used 
with the address signals for aligned transfers. Table 9-3 shows how 
the TSIZ signals are used with the address signals for misaligned 
transfers. For I/O transfer protocol, these signals form part of the I/O 
transfer code (see the entry in this table for TTO-TT4). 


For external control instructions (eciwx and ecowx), TSIZO-TSIZ2 
are used to output bits 29-3 1to the EAR, which are used to form the 
resource ID (TBSTIITSIZO-TSIZ3). 


Timing Comments Assertion/Negation—The same as A0-A31. 
High Impedance—The same as AO-A31. 
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Table 8-3. Data Transfer Size 


aes [aot 
[eases [coo [Boe 
[gst | 198 [Hoes 
















8.2.4.2.2 Transfer Size (TSIZO—TS1IZ2)—Input 
Following are state meaning and timing comments for the TSIZO—-TSIZ2 input signals on 


the MPC601. 
State Meaning 


Timing Comments 


Asserted/Negated—Represents the size of the current transfer, as 
shown in Table 8-3. For the I/O controller interface protocol, these 
signals form part of the I/O transfer code (see TT). 


Assertion/Negation—The same as AQ-A31. 


8.2.4.3 Transfer Burst (TBST) 
The transfer burst (TBST) signal is an input/output signal on the MPC601. 


8.2.4.3.1 Transfer Burst (TBST)—Output 
Following are the state meaning and timing comments for the TBST output signal. 


State Meaning 


Timing Comments 


MOTOROLA 


Asserted—lIndicates that a burst transfer is in progress when asserted 
and TSIZO-TSIZ2 are set to 010. 


Negated—Indicates that a burst transfer is not in progress. Also, part 
of I/O transfer code (see TT). 


For external control instructions (eciwx and ecowx), TBST are used 
to output bit 28 to the EAR, which are used to form the resource ID 
(TBSTIITSIZO-TSIZ3). 


Assertion/Negation—The same as AQ-A31 
High Impedance—The same as AQ-A31. 
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8.2.4.3.2 Transfer Burst (TBST)—Input 
Following are state meaning and timing comments for the TBST input signal. 


State Meaning Asserted/Negated—Indicates that a burst transfer is in progress 
when asserted and TSIZO-TSIZ2 are set to 010. For the I/O transfer 
protocol, this signal forms part of the I/O transfer code (see the entry 
in this table for TT). 


Timing Comments Assertion/Negation—The same as AO-A31. 


8.2.4.4 Transfer Code (TCO—-TC1)—Output 


The transfer code (TCO-TC1) consists of four output signals on the MPC601. Following 
are the state meaning and timing comments for the TCO-TC1 signals. 


State Meaning Asserted/Negated—Represents a special encoding for the transfer in 
progress (see Table 8-4). 


Timing Comments Assertion/Negation—The same as AQ-A31. 
High Impedance—The same as AQ-A31. 


Table 8-4. Encodings for TCO-TC3 


Os eee. eee eee! 


TCO Depends on whether the current transaction is a read or write operation; therefore, TCO should be 
used with TT1. On a read operation, TCO asserted indicates the transaction is an instruction fetch 
operation; otherwise, the read operation is a data operation. 

Asserting TCO for write operations indicates the write is a response to a snoop hit to modified data; 
TCO negated indicates the write is nota snoop push (it is therefore a cache cast-out, write-through, or 
cache-inhibited write operation). 















TC1, when asserted, indicates that an operation to reload the other sector is queued; therefore, the 
next bus transaction will likely be to the same page of memory. After the addressed sector in a cache 
line is loaded from memory, the MPC601 attempts to load the other sector in the cache line. This is a 
low-priority bus operation and may not be the next transaction. The assertion of TC1 suggests that the 
next access may be to the same page; the hint may be wrong depending on the bus traffic/code 

execution dynamics. 








8.2.4.5 Cache Inhibit (Cl)—Output 


The cache inhibit (CT) signal is an output signal on the MPC601. Following are the state 
meaning and timing comments for the CT signal. 


State Meaning Asserted—lIndicates that a single-beat transfer will not change the 
cache, reflecting the setting of the I bit for the address of the current 
transaction. 


Negated—Indicates that a burst transfer will allocate a sector in the 
MPC601 data cache. 


Timing Comments Assertion/Negation—The same as AQ-A31 
High Impedance—The same as AO-A31. 
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8.2.4.6 Write-through (WT)—Output 


The write-through (WT) signal is an output signal on the MPC601. Following are the state 
meaning and timing comments for the WT signal. 


State Meaning Asserted—Indicates that a single beat transaction is write-through, 
reflecting the value of the W bit for the address of the current 
transaction. 


Negated—Indicates that a transaction is not write-through. 


Timing Comments Assertion/Negation—The same as AOQ-A31 
High Impedance—The same as AQ-A31. 


8.2.4.7 Global (GBL) 
The global (GBI) signal is an input/output signal on the MPC601. 


8.2.4.7.1 Global (GBL)—Output 
Following are the state meaning and timing comments for the GBL output signal. 
State Meaning Asserted—lIndicates that a transaction is global, reflecting the setting 


of the M bit for the address of the current transaction (except in the 
case of copy-back operations, which are non-global.) 


Negated—Indicates that a transaction is not global. 
Timing Comments Assertion/Negation—The same as AQO-A31 

High Impedance—The same as AQ-A31 
8.2.4.7.2 Global (GBL)—Input 
Following are the state meaning and timing comments for the GBL input signal. 


State Meaning Asserted— Indicates that a transaction must be snooped by the 
MPC601. 


Negated—Indicates that a transaction is not snooped by the MPC601 
(even if TTO—TT4 indicate an invalidation transaction). 


Timing Comments Assertion/Negation—The same as AQ-A31. 


8.2.4.8 Cache Set Element (CSEQ—CSE2)—Output 


The cache set element (CSEQ-—CSE2) signals consist of three output signals on the 
MPC6()1. Following are state meaning and timing comments for the CSE signals. 


State Meaning Asserted/Negated—Represents the cache replacement set element 
for the current transaction reloading into or writing out of the cache. 
Can be used with the address bus and the transfer attribute signals to 
externally track the state of each cache sector in the MPC601’s 
cache. 
See Section 4.7.4, “MESI Hardware Considerations.” 
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Timing Comments Assertion/Negation—The same as AO-A31 
High Impedance—The same as AQ-A31. 


8.2.4.9 High-Priority Snoop Request (HP_SNP_REQ) 

The High-Priority Snoop Request (HP_SNP_REQ) signal is an input signal (input-only) on 
the MPC601. Following are state meaning and timing comments for the HP_SNP_REQ 
signal 


State Meaning Asserted—lIndicates that the MPC601 may add an additional 
reserved queue to the list of available queues for push transactions 
that are a result of a snoop hit. 


Negated—Indicates that the MPC601 will not make available the 
reserved queue for a snoop hit push resulting from a transaction. This 
is the “normal” mode. 

Timing Comments Assertion/Negation—The same as AQ-A31. 


NOTE: This pin is a feature of the MPC601 only and will not be available in any other 
PowerPC processors. 


8.2.5 Address Transfer Termination Signals 


The address transfer termination signals are used to indicate either that the address phase 
of the transaction has completed successfully or must be repeated, and when it should be 
terminated. These signals are also used to maintain MESI protocol. For detailed 
information about how these signals interact, see Section 9.3.3, “Address Transfer 
Termination.” 


8.2.5.1 Address Acknowledge (AACK)—Input 


The address acknowledge (AACK) signal is an input signal (input-only) on the MPC601. 
Following are state meaning and timing comments for the AACK signal. 


State Meaning Asserted—Indicates that the address phase of a transaction is 
complete. The address bus will go to a high impedance state on the 
next bus clock cycle. The MPC601 samples ARTRY on the bus clock 
cycle following the assertion of AACK. 


Negated—(During ABB) indicates that the address bus and the 
transfer attributes must remain driven. 


Timing Comments Assertion—May occur as early as the bus clock cycle after TS or 
XATS is asserted; assertion can be delayed to allow adequate address 
access time for slow devices. For example, if an implementation 
supports slow snooping devices, an external arbiter can postpone the 


assertion of AACK. 
Negation—Miust occur one bus clock cycle after the assertion of 
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8.2.5.2 Address Retry (ARTRY) 
The address retry (ARTRY) signal is input/output signal on the MPC601. 


8.2.5.2.1 Address Retry (ARTRY)—Output 

Following are the state meaning and timing comments for the ARTRY output signal. 

State Meaning Asserted—lIndicates that the MPC6()1 detects a condition in which a 
snooped address tenure must be retried (see SHD for encoding). If 


the MPC601 needs to update memory as a result of the snoop that 
caused the retry, the MPC601 asserts BR (unless it is parked). 


High Impedance—Indicates that the MPC601 does not need the 
snooped address tenure to be retried. 


Timing Comments Assertion—Occurs two bus cycles immediately following the 
assertion of TS if a retry is required. 


Negation—Occurs the bus cycle after the assertion of AACK. Since 
this signal may be simultaneously driven by multiple devices, it 
negates in a unique fashion. First the buffer goes to high impedance 
for one bus cycle, then it drives high for one 2XPCLK cycle before 
returning to high impedance. 


This special method of negation may be disabled using the mtspr 
instruction to write bit 29 of the HIDO register. 


Table 8-6 shows the relationship between the SHD and ARTRY signals. 
Table 8-5. SHD and ARTRY Signals 


[as [mee [sein | 
No snoop hit, no busy 
pipeline 






Snoop hit shared 
Snoop hit modified 


8.2.5.2.2 Address Retry (ARTRY)—Input 
Following are the state meaning and timing comments for the ARTRY input signal. 


State Meaning Asserted—If the MPC601 is the address bus master, ARTRY 
indicates that the MPC601 must retry the preceding address tenure 
and immediately negate BR (if asserted). If the MPC601 is not the 
address bus master, this input indicates that the MPC601 should 
immediately negate BR for one bus clock cycle following the 
negation of ARTRY. Note that the subsequent address retried may 
not be the same one associated with the assertion of the ARTRY 
signal. 
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Negated/High Impedance—Indicates that the MPC601 does not 
need to retry the last address tenure. 


Timing Comments Assertion—Must occur by the bus clock cycle immediately 
following the assertion of AACK if a retry is required. 


Negation—Must occur during the second cycle after the assertion of 
AACK. Note that this signal is sampled only following the assertion 
of AACK. 


8.2.5.3 Shared (SHD) 
The shared (SHD) signal is an input/output signal on the MPC601. 


8.2.5.3.1 Shared (SHD)—Output | 
Following are the state meaning and timing comments for the SHD output signal. 


State Meaning Asserted—Indicates that the MPC601 either needs the data to be 
shared (in response to a snoop hit for transaction not requiring 
invalidation) or with ARTRY indicates the MPC6()i has a hit ona 
cache sector marked as modified. 


Negated/High Impedance—Indicates that the MPC601 did not have 
a cache hit on the snooped address. 


Timing Comments Assertion—The same as ARTRY. 
Negation—The same as ARTRY. 
High Impedance—The same as ARTRY. 


See Table 8-6 for information on SHD and ARTRY signals. 
Table 8-6. SHD and ARTRY Signals 


[a [ a | onciion 
No snoop hit, no busy 
pipeline 






Snoop hit shared 
Snoop hit modified 





8.2.5.3.2 Shared (SHD)—Input 
Following are the state meaning and timing comments for the SHD input signal. 


State Meaning Asserted—Indicates that for a self-generated transaction, the 
MPC6(0)1 must allocate the incoming sector as shared (unmodified). 
Or if ARTRY is asserted, the transaction must be retried while the 
other master updates memory. 
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Negated—Indicates that the address for the current transaction is not 
in any other cache. 


Timing Comments Assertion—The same as ARTRY. 
Negation—The same as ARTRY. 


8.2.6 Data Bus Arbitration Signals 


Like the address bus arbitration signals, data bus arbitration signals maintain an orderly 
process for determining data bus mastership. Note that there is no data bus arbitration signal 
equivalent to the address bus arbitration signal BR (bus request), because, except for 
address-only transactions, TS and XATS imply data bus requests. For a detailed description 
on how these signals interact, see Section 9.4.1, “Data Bus Arbitration.” 


One special signal, DBWO, allows the MPC601 to be configured dynamically to write data 
out of order with respect to read data. For detailed information about using DBWO, see 
Section 9.10, “Using DBWO (Data Bus Write Only).” 


8.2.6.1 Data Bus Grant (DBG)—Input 
The data bus grant (DBG) signal is an input signal (input-only) on the MPC601. Following 
are the state meaning and timing comments for the DBG signal. 


State Meaning Asserted—Indicates that the MPC601 may, with the proper 
qualification, assume mastership of the data bus. The MPC601 
derives a qualified data bus grant when DBG is asserted and DBB, 
DRTRY, and ARTRY are negated; that is, the data bus is not busy 
(DBB is negated) and there is no outstanding attempt to retry the 
associated address tenure (ARTRY is negated) or the current data 
tenure (DRTRY is negated). 


Negated—Indicates that the MPC601 must hold off its data tenures. 


Timing Comments Assertion—May occur any time to indicate the MPC601 is free to 
take data bus mastership. It is not sampled until TS is asserted. 


Negation—May occur at any time to indicate the MPC601 cannot 
assume data bus mastership. 


8.2.6.2 Data Bus Write Only (DBWO)—Input 


The data bus write only (DBWO) signal is an input signal (input-only) on the MPC601. 
Following are the state meaning and timing comments for the DBWO signal. 
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State Meaning Asserted—Indicates that the MPC601 may run the data bus tenure 
for an outstanding write address even if a read address is pipelined 
before the write address. If DBW0O is asserted, the MPC601 only 
assumes data bus ownership for a pending data bus write operation 
(that is, the MPC601 does not take the data bus for a pending read 
operation if this input is asserted along with DBG). Care must be 
taken with BG to ensure the desired write is queued (such as a snoop 
hit push). Refer to Section 9.10, “Using DBWO (Data Bus Write 
Only),” for detailed instructions for using DBWO. 


Negated—Indicates that the MPC601 must run the data bus tenures 
in the same order as the address tenures. 


Timing Comments Assertion—Must occur no later than a qualified DBG for a previous 
write tenure. Do not assert if no pending data bus write tenures are 
pending from previous address tenures. 


Negation—May occur any time after a qualified DBG and before the 
next assertion of DBG. 


8.2.6.3 Data Bus Busy (DBB) 
The data bus busy (DBB) signal is input/output signal on the MPC601. 


8.2.6.3.1 Data Bus Busy (DBB)—Output 
Following are the state meaning and timing comments for the DBB output signal. 


State Meaning Asserted—Indicates that the MPC601 is the data bus master. The 
MPC6(0)1 always assumes data bus mastership if it needs the data bus 
and is given a gualified data bus grant (see DBG). 


Negated—Indicates that the MPC601 is not using the data bus. 


Timing Comments Assertion—Occurs during the bus clock cycle following a qualified 
DBG. 


Negation—Occurs during the bus clock cycle following the 
assertion of the final TA. 


High Impedance—Occurs one-half processor clock cycle after DBB 
is negated. 


8.2.6.3.2 Data Bus Busy (DBB)—Input 
Following are the state meaning and timing comments for the DBB input signal. 


State Meaning Asserted—Indicates that another device is bus master. 
Negated—Indicates that the data bus is free (with proper 
qualification, see DBG) for use by the MPC601. 
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Timing Comments Assertion—Must occur when the MPC601 must be prevented from 
using the data bus. 


Negation—May occur whenever the data bus is available. 


8.2.7 Data Transfer Signals 


Like the address transfer signals, the data transfer signals are used to transmit data and to 
generate and monitor parity for the data transfer. For a detailed description of how the data 
transfer signals interact, see Section 9.4.2, “‘Data Transfer.” 


8.2.7.1 Data Bus (DHO-DH31, DLO—-DL31) 


The data bus (DHO-DH31 and DLO-DL31) consists of 64 signals that are both input and 
output on the MPC601. Following are the state meaning and timing comments for the DH 
and DL signals. 


State Meaning The data bus has two halves—data bus high (DH) and low (DL). See 
Table 8-6 for the data bus lane assignments. Data byte lanes are 
illustrated in Figure 9-10. 1/O controller interface operations use DH 
exclusively (that is, there are no 64-bit, I/O transfers). 


Timing Comments The data bus is driven once for non-cached transactions and four 
times for cache transactions (bursts). | 8 | 


Table 8-7. Data Bus Lane Assignments 


fori 
[onieones if ed 
[orerowss 
a 
ee 


8.2.7.1.1 Data Bus (DHO—DH31, DLO-DL31)—Output ; 
Following are the state meaning and timing comments for the DH and DL output signals. 


State Meaning Asserted/Negated—Represents the state of data during a data write. 
Unused byte lanes are driven to deterministic values. 


Timing Comments Assertion/Negation—Initial beat coincides with DBB and, for 
bursts, transitions on the bus clock cycle following each assertion of 
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High Impedance—Occurs on the bus clock cycle after the final 
assertion of TA. 


8.2.7.1.2 Data Bus (DHO—DH31, DLO-DL31)—Input 
Following are the state meaning and timing comments for the DH and DL input signals. 


State Meaning 


Timing Comments 


Asserted/Negated—Represents the state of data during a data read 
transaction. 


Assertion/Negation—Must occur on the same bus clock cycle that 
TA is asserted; however, if DRTRY is asserted, it must coincide with 
the assertion of the final DRTRY for a given data beat. 


8.2.7.2 Data Bus Parity (DPO—DP7) 
The eight data bus parity (DPO-DP7) signals on the MPC601 are both output and input 


signals. 


8.2.7.2.1_ Data Bus Parity (DPO-DP7)—Output 
Following are the state meaning and timing comments for the DP output signals. 


State Meaning 


Timing Comments 


8-22 


Asserted/Negated—Represents odd parity for each of eight bytes of 
data write transactions. The signal assignments are listed in 
Table 8-8. 


Assertion/Negation—The same as DLO-DL31 
High Impedance—The same as DLO-DL31 


Table 8-8. DPO—DP7 Signal Assignments 


[arose oar 
[oreo 
ar 
[ars 











DL8-DL15 
DLi6-DL23 
DL24—DL31 
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8.2.7.2.2 Data Bus parity (DPO—DP7)—Input 
Following are the state meaning and timing comments for the DP input signals. 


State Meaning Asserted/Negated—Represents odd parity for each byte of read 
data. Parity is checked on all data byte lanes, regardless of the size 
of the transfer. Detected even parity causes a machine-check 
exception if data parity errors are enabled in the ME bit of the MSR. 
(See DPE.) 


Timing Comments _Assertion/Negation—The same as DLO-DL31. 


8.2.7.3 Data Parity Error (DPE)—Output 
The data parity error (DPE) signal is an output signal (output-only) on the MPC601. 
Following are the state meaning and timing comments for the DPE signal. 


State Meaning Asserted—Indicates incorrect data bus parity. 
Negated—lIndicates correct data bus parity 


Timing Comments Assertion—Occurs on the second bus clock cycle after TA is 
asserted to the MPC601. 


Negation—Occurs on the third bus clock cycle after TA is asserted 
to the MPC601. 


8.2.8 Data Transfer Termination Signals 


Data termination signals are required after each data beat in a data transfer. Note that in a 
single-beat transaction, the data termination signals also indicate the end of the tenure, 
while in burst accesses, the data termination signals apply to individual beats and indicate 
the end of the tenure only after the final data beat. 


These signals are also used to maintain MES] protocol. For a detailed description of how 
these signals interact, see Section 9.4.3, “Data Transfer Termination.” 


8.2.8.1 Transfer Acknowledge (TA)—Input 


The transfer acknowledge (TA) signal is an input signal (input-only) on the MPC601. 
Following are state meaning and timing comments for the TA signal. 


State Meaning Asserted— Indicates that a single-beat data transfer completed 
successfully or that a data beat in a burst transfer completed 
successfully (unless DRTRY is asserted on the next bus clock cycle). 
Note that TA must be asserted for each data beat in a burst 
transaction. For more information refer to Section 9.4.3, “Data 
Transfer Termination.” 


Negated—(During DBB) indicates that, until TA is asserted, the 
MPC6()1 must continue to drive the data for the current write or must 
wait to sample the data for reads. 
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Timing Comments Assertion—Must not occur before AACK for the current transaction 
(if the address retry mechanism is to be used; otherwise, assertion 
may occur at any time during the assertion of DBB. The system can 
withhold assertion of TA to indicate that the MPC601 should insert 
wait states to extend the duration of the data beat. 


Negation—Must occur after the bus clock cycle of the final (or only) 
data beat of the transfer. For a burst transfer, the system can assert TA 
for one bus clock cycle and then negate it to advance the burst 
transfer to the next beat and insert wait states during the next beat. 


8.2.8.2 Data Retry (DRTRY)—Input 


The data retry (DRTRY) signal is input only on the MPC601. Following are state meaning 
and timing comments for the DRTRY signal. 


State Meaning Asserted—Indicates that the MPC601 must invalidate the data from 
the previous read operation. 


Negated—Indicates that data presented with TA on the previous read 
operation is valid. This is essentially a late TA to allow speculative 
forwarding of data (with TA) during reads. Note that DRTRY is 
ignored for write transactions 


Timing Comments Assertion—Miust occur during the bus clock cycle immediately after 
TA is asserted if a retry is required. The DRTRY signal may be held 
asserted for multiple bus clock cycles. 


Negation—Must occur during the bus clock cycle after a valid data 
beat. This may occur several cycles after DBB is negated, effectively 
extending the data bus tenure. 


8.2.8.3 Transfer Error Acknowledge (TEA)—Input 


The transfer error acknowledge (TEA) signal is input only on the MPC601. Following are 
state meaning and timing comments for the TEA signal. 


State Meaning Asserted—Indicates that a bus error occurred. Causes a machine 
check exception (and possibly causes the processor to enter 
checkstop state if machine check enable bit is cleared (MSR[ME] = 
()). For more information see Section 5.4.2.2, “Checkstop State 
(MSR[ME] = 0).” Assertion terminates the current transaction; that 
is, assertion of TA and DRTRY are ignored. The assertion of TEA 
causes the negation/high impedance of DBB in the next clock cycle. 
However, data entering the GPR or the cache are not invalidated. 


Negated—Indicates that no bus error was detected. 
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Timing Comments Assertion—May be asserted while DBB and/or DRTRY is asserted. 


Negation—TEA must be asserted for at least one bus clock cycle. 
TEA must be negated no later than the negation of DBB or the last 
DRTRY. 


8.2.9 System Status Signals 


Most system status signals are input signals that indicate when exceptions are received, 
when checkstop conditions have occurred, and when the MPC601 must be reset. The 
MPC601 generates the output signal, CKSTP_OUT, when it detects a checkstop condition. 
For a detailed description of these signals, see Section 9.7, “Interrupt, Checkstop, and Reset 
Signals.” 


8.2.9.1 Interrupt (INT)—Input 


The interrupt (INT) signal is input only. Following are state meaning and timing comments 
for the INT signal. 


State Meaning Asserted—lIndicates that if the MSR[EE] (bit 16, the external 
interrupt enable bit) is set, the MPC6()1 begins processing an 
external interrupt exception. 


Negated—Indicates that normal operation should proceed. See 
Section 9.7.1, “External Interrupt.” 


Timing Comments Assertion—May occur at any time. 
Negation—May occur any time after the minimum pulse width has 
been met. (Minimum pulse width is 3 processor clock cycles.) After 
the minimum pulse width has been met, an interrupt exception 
occurs, 


8.2.9.2 Checkstop Input (CKSTP_IN)—Input 
The checkstop input (CKSTP_IN) signal is input only on the MPC601. Following are state 
meaning and timing comments for the CKSTP_IN signal. 


State Meaning Asserted—lIndicates that the MPC601 must terminate operation by 
internally gating off all clocks. Once CKSTP_IN has been asserted 
it must remain asserted until the system has been reset; otherwise the 
clocks resume operation. 


Negated—Indicates that normal operation should proceed. See 
Section 9.7.2, “Checkstops.” 
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Timing Comments 


Assertion—May occur at any time and may be asserted 
asynchronously to the input clocks. CKSTP_IN must be asserted for 
a minimum of three PCLK_EN clock cycles. Or, it may be asserted 
synchronously meeting setup and hold times (specified in the 
electrical specifications) and must be asserted for at least two 


PCLK_EN clock cycles. 


Negation—May occur any time after the CKSTP_OUT output signal 
has been asserted. 


8.2.9.3 Checkstop Output (CKSTP_OUT)—Output 


The checkstop output (CKSTP_OUT) signal is output only on the MPC601. Following are 
state meaning and timing comments for the CKSTP_OUT signal. 


State Meaning 


Timing Comments 


Asserted—lIndicates that the MPC60)1 has detected a checkstop 
condition and has ceased operation. 


Negated—Indicates that the MPC6()1 is operating normally. 
See Section 9.7.2, “Checkstops.” 


Assertion—May occur at any time and may be asserted 
asynchronously to the MPC6()1 input clocks. 


Negation—Requires HRESET assertion. 


8.2.9.4 Reset Signals 


There are two reset signals on the MPC601—hard reset (HRESET) and soft reset 
(SRESET). Descriptions of each follows. 


8.2.9.4.1 Hard Reset (HRESET)—Input 
The hard reset (HRESET) signal is input only and must be used at power-on to properly 
reset the processor. Following are state meaning and timing comments for the HRESET 


signal. 
State Meaning 


8-26 


Asserted—lInitiates a complete hard reset operation when this input 
transitions from asserted to negated. Causes a reset exception as 
described in Section 5.4.1.2, “Hard Reset.” Output drivers are 
released to high impedance within three clocks after the assertion of 


Negated—lIndicates that normal operation should proceed. See 
Section 9.7.3, “Reset Inputs.” 
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Timing Comments Assertion—May occur at any time and may be asserted 
asynchronously to the MPC6()1 input clocks. 
Negation—May occur any time after the minimum reset pulse width 
has been met. (Minimum pulse width is 300 processor clock cycles.) 


This input has additional functionality in certain test modes. 

8.2.9.4.2 Soft Reset (SRESET)—Input 

The soft reset (SRESET) signal is input only. Following are state meaning and timing 
comments for the SRESET signal. 


State Meaning Asserted— Initiates processing for a a reset exception as described 
in Section 5.4.1.1, “Soft Reset.” 


Negated—Indicates that normal operation should proceed. See 
Section 9.7.3, “Reset Inputs.” 


Timing Comments Assertion—May occur at any time. 
Negation—May occur any time after the minimum soft-reset pulse 
width has been met. (Minimum pulse width is 10 processor clock 
cycles.) 


This input has additional functionality in certain test modes. 
8.2.9.5 System Quiesced (SYS_QUIESC) 


The system quiesced (SYS_QUIESC) signal is input only. Following are state meaning and 
timing comments for the SYS_QUIESC signal. 


State Meaning Asserted—Enables soft stop in the MPC601 
Negated: indicates that soft stop is not enabled in the 
MPC6()1 processor. 


Timing Comments Assertion/Negation—Must meet setup and hold times as described 
in the electrical specifications. 


8.2.9.6 Resume (RESUME) 


The resume (RESUME) signal is input only. Following are state meaning and timing 
comments for the RESUME signal. 


State Meaning Asserted—Restarts the MPC6()1 after a soft stop. 


Negated—Indicates that the MPC601 is not allowed to resume 
normal operation if a soft stop has occurred. 
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Timing Comments Assertion—May occur at any time and may be asserted 
asynchronously to the MPC601 input clocks. RESUME must be 
asserted for a minimum of three PCLK_EN clock cycles. Or, it may 
be asserted synchronously meeting setup and hold times (specified in 
the electrical specifications) and must be asserted for at least two 
PCLK_EN clock cycles. For information, see the MPC60)1 electrical 
specifications. 


Negation—May occur any time after the minimum pulse width has 
been met. 


8.2.9.7 Quiesce Request (QUIESC_REQ) 
The quiesce request (QUIESC_REQ) signal is output only. Following are state meaning 
and timing comments for the QUIESC_REQ signal. 


State Meaning Asserted—Indicates that the MPC601 is requesting a soft stop for 
the system. 


Negated—Indicates that the MPC601 is operating normally. 


Timing Comments Assertion—May occur at any time to indicate that the MPC601 is 
requesting a soft stop. 


Negation: may occur at any time to indicate that the MPC601 is not 
requesting a soft stop. 


8.2.9.8 Reservation (RSRV)—Output 


The reservation (RSRV) signal is output only on the MPC601. Following are state meaning 
and timing comments for the RSRV signal. 


State Meaning Asserted/Negated—Represents the state of the reservation 
coherency bit in the reservation address register that is used by the 
Iwarx and stwex instructions. See Section 9.8.1, “Support for the 
lwarx/stwcx. Instruction Pair.” 

Timing Comments Assertion/Negation—Occurs synchronously with respect to bus 
clock cycles. The execution of an lwarx instruction sets the internal 
reservation condition when the next bus transition occurs, RSRV is 
asserted. 


8.2.9.9 Driver Mode (SC_DRIVE) 
The driver mode (SC_DRIVE) signal is input only on the MPC6()1. Following are state 
meaning and timing comments for the SC_DRIVE signal. 


State Meaning Asserted—Indicates that the drive current for the following output 
buffers is increased; ABB, DBB, ARTRY, SHD, TS, XATS. 
(approximately 2x). 
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Negated—tThe drive current for the six signals above will be the 
same as all other signals for the MPC601. 


Timing Comments Assertion/Negation—This is not a dynamic signal; it must not 
transition after HRESET is negated. 


8.2.10 ESP/Scan Interface 

The MPC601 has extensive on-chip test capability including the following: 
¢ Built-in Self test (BIST) 
¢ Debug control/observation (ESP) 
¢ Boundary scan . 


The built-in self test hardware is exercised as part of the POR sequence. The ESP and 
boundary scan logic are not used under typical operating conditions. 


Detailed discussion of the MPC601 test functions is beyond the scope of this document; 
however, sufficient information has been provided to allow the system designer to disable 
the test functions that would impede normal operation. 


The test interface is provided for testing. Table 8-9 describes the test interface signals. For 
more information, refer to Section 9.9, “IEEE 1149.1-Compatible Interface.” The interface 
is shown in Figure 8-2. 












TDI module lead #186 
TMS module lead #184 
TCK module lead #187 
BSCAN_EN module lead #186 


TDO module lead #078 


TRST module lead #279 


Figure 8-2. IEEE 1149.1-Compatible Boundary Scan Interface 
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Table 8-9. ESP/Scan Interface 


This output signal is a no connect (NC) for non-test board designs. 
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8.2.11 Test Signals 


Table 8-10 describes the MPC6()1’s test and COP signals. The value in the operational level 
column should be used for normal operations. 


This output signal is ano connect (NC) for non-test board designs 


Table 8-10. Test Interface 
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Table 8-10. Test Interface (Continued) 


Name Level 
Po es ee 
pase [ef 
Parser [ee ft 
oc 
pastes [em 












8.2.12 Clock Signals 


The clock signal inputs of the MPC6()1 determine the system clock frequency and provide 
a flexible clocking scheme that allows the processor to operate at an integer multiple of the 
system clock frequency. 


Refer to the MPC601 electrical specifications for exact timing relationships of the clock 
signals. 


8.2.12.1 Double-Speed Processor Clock (2X_PCLK)—Input 

The double-speed processor clock (2X_PCLK) signal is input only on the MPC6(0)1. This 
signal is the highest frequency input to the MPC601; it switches at twice the frequency of 
the internal P_CLOCK provided that the PCLK_EN signal is half the frequency of the 
2X_PLCLK as shown in Figure 8-3. This input clocks the latch that samples the PCLK_EN 
input, providing duty-cycle control for the internal P_CLOCK (see Figure 8-3). 


Following are state meaning and timing comments for the 2X_PCLK signal. 


State Meaning Rising Edge—Is the clocking edge for a synchronizing latch used to 
generate the internal processor clock (see PCLK_EN). See 
Section 8.2.12, “Clock Signals." 


Timing Comments Duty cycle—Refer to the MPC601 electrical specifications. 


8.2.12.2 Clock Phase (PCLK_EN)—Input 


The clock phase (PCLK_EN) signal is input only on the MPC601. The PCLK_EN signal 
switches at the same frequency as the internal CPU clock (P_CLOCK in Figure 8-3). The 
PCLK_EN signal determines the phase of the internal P_CLOCK (timing and duty cycle 
are derived from the 2X_PCLK input); therefore, this input can be used to synchronize 
multiple MPC601s. 


Figure 8-3 shows how the internal P_CLOCK is always identical to the PCLK_EN signal 
except it is inverted and delayed by one full 2X_PCLK cycle. 


The MPC601 can tolerate dynamic P_CLOCK cycle stretching. This can be accomplished 
by altering the duty cycle of the PCLK_EN input. For example, the system can extend a 
given CPU clock cycle by negating PCLK_EN for more than one 2X_PCLK cycle. This 
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effectively delays the bus clock input sampling points and output drive points in half of a 
processor cycle increments and further delays execution of instructions accordingly. 


(internal) 
P_CLOCK 


2X_PCLK 





Figure 8-3. Internal P_CLOCK Generation 


Following are state meaning and timing comments for the PCLK_EN signal. 


State Meaning Asserted—Indicates that the MPC601 should generate the high 
phase of the internal processor clock synchronized to 2X_PCLK. 
See Section 8.2.12, “Clock Signals.” 


Negated—Indicates that MPC601 should generate the low phase of 
the internal processor clock synchronized to 2X_PCLK. 


Timing Comments Assertion—May occur one 2X_PCLK cycle after the negation of 
PCLK_EN with appropriate setup to the falling edge of 2X_PCLK. 


Negation—Must occur one 2X_PCLK cycle after the assertion of 
PCLK_EN with appropriate setup to the falling edge of 2X_PCLK. 


8.2.12.3 Bus Phase (BCLK_EN)—Input 

The bus phase (BCLK_EN) signal is input only on the MPC601. This input determines, in 
conjunction with PCLK_EN and 2X_PCLK, the transition timing for the MPC601 bus 
interface. While all timing is derived from the rising edge of the 2X_PCLK input, the two 
phase inputs qualify the edge on which the processor and bus interface sequential logic can 
proceed. Inputs are sampled and outputs are driven with the qualified rising edge of the 
2X_PCLK input (see Figure 8-4). 


Following are state meaning and timing comments for the BCLK_EN signal. 


State Meaning Asserted— Indicates that the MPC601 must use the next rising edge 
of the internal processor clock to sample and drive the bus interface. 


Negated—Indicates that MPC60)1 outputs must not change state, 
inputs will not be sampled. This signal can be treated as a 
synchronous enable for the bus clock cycle clock. See 

Section 8.2.12, “Clock Signals." 
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Timing Comments Assertion/Negation—With appropriate setup and hold time to the 
2X_PCLK provided the rising edge of the internal processor clock 
coincides with the 2X_PCLK. 


Figure 8-4 through Figure 8-7 illustrate how the MPC60)1 clocking signals can be used to 
generate a logical bus clock. Note that the resulting logical bus clock is represented as an 
arrow coincident with the rising edge of the resulting signal. It should not be inferred that 
the duty cycle of the bus clock signal is 50 percent. 


Figure 8-4 shows how the clock inputs can be used to control the MPC6()1. Note that the 
signal IN is the output of the inverter shown in Figure 8-3. 


) #Oe-sif ote I) ee. Ji! ge: Te Ab IE SBS oh te Sp. oe ol 
ox poll 2 a Ye a 


PCCK_EN : 





IN; 


INTCLK : 





“Delay 
Figure 8-4. Generation of Internal Clock (INTCLK) 


Figure 8-4 shows a simple MPC601 clock implementation with the frequency of the logical 
bus clock equal to that of the P_CLK. 


, Oo | 1 | 2 | 38 | 4 1 5 | 6 1 7 I 


porosity i 2 a all i. 





Logical Bus Clock A A 4 
*Delay of inverter output 
Figure 8-5. Generation of Bus Transitions—Logical Bus Clock = P_CLK 


Figure 8-6 shows the generation of the logical bus clock at one-half the frequency of the 
P_CLK. 
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Figure 8-6. Generation of Bus Transitions—-Logical Bus Clock = 1/2 P_CLK 


Figure 8-7 shows how the PCLK_EN signal can be manipulated to perform cycle stretching 
on the MPC6()1. 


oO; 1/ 2]3 | 4 | [Fo Bade So 1 at 


acre LL PLL 


rom, ot 





Bus Transition : : A | ! ! A 
“Delay 
Figure 8-7. Generation of Bus Transitions—Cycle Stretching 


In this document, processor clock refers to the internal P_CLOCK signal; bus clock refers 
to the clock that causes the bus transitions. 


Figure 8-5 and Figure 8-6 show two examples of the generation of bus transitions. In the 
first example, BCLK_EN is grounded (always asserted) and the bus clock period is 
equivalent to the P_CLOCK cycle period. In the second example, the BCLK_EN input is 
driven by a clock switching at PCLK_EN/2 frequency. This allows the MPC601 bus 
interface to run at half the frequency of the CPU P_CLOCK, easing system design 
constraints. Note that the BCLK_EN input can be divided further (with respect to 
PCLK_EN), allowing an even greater ratio between the clock- and bus-cycle frequencies. 
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To operate the bus interface slower than P_CLOCK/2, BCLK_EN must be asserted only 
for the intended P_CLOCK window (for example, the duty cycle can be skewed such that 
the bus logic increments only once during each assertion of BCLK_EN). 


8.2.12.4 Real-Time Clock (RTC)—Input 


The real-time clock (RTC) signal is input only on the MPC601. Following are state 
meaning and timing comments for the RTC signal. 


State Meaning Rising Edge—Increments the 7.8125-MHz real-time clock in the 
MPC601. 


Timing Comments Duty cycle—See the MPC6()1 electrical specifications. 


8.3 Clocking in a Multiprocessor System 


Clocking in a multiprocessor system adds a level of complexity. The MPC6()1 defines the 
AC timing specifications for the chip inputs and outputs to allow for a reasonable amount 
of system-level skew and still allow the chip to meet its timing goals. These timing 
specifications can be found in the MPC601 electrical specifications. 
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Chapter 9 
System Interface Operation 


This section describes the MPC6()1 bus interface and its operation. It shows how the 
MPC601 signals, defined in Chapter 8, “Signal Descriptions,” interact to perform address 
and data transfers. 


9.1 MPC601 System Interface Overview 


The system interface performs external accesses for loading and storing data and fetching 
instructions. 


Instructions are automatically fetched from the memory system into the instruction unit 
where they are dispatched to the execution units at a maximum rate of three instructions per 
clock. Conversely, load and store instructions explicitly specify the movement of operands 
to and from the integer and floating-point register files and the memory system. 


When the MPC6()1 encounters an instruction or data access, it calculates the logical address 
(effective address) and uses the low-order address bits to check for a hit in the on-chip, 32- 
Kbyte cache. Operation of the cache is described in Section 9.1.1, “Operation of the On- 
Chip Cache.” During the cache lookup, the memory management unit (MMU) uses the 
upper-order address bits to calculate the virtual address, from which it calculates the 
physical address. The physical address bits are then compared with the corresponding 
cache tag bits to determine if a cache hit occurred. If the access misses in the cache, the 
physical address is used to access system memory. 


In addition to the loads, stores, and instruction fetches, the MPC601 performs other read 
and write operations for table searches, cache cast-out operations when least-recently used 
sectors are written to memory after a cache miss, and cache-sector snoop push-out 
operations when a modified sector experiences a snoop hit from another bus master. 


All read and write operations are handled by the memory unit, which consists of a two- 
element read queue that holds addresses for read operations, and a three-element write 
queue that contains addresses and data for write operations. To maintain coherency, the 
write queues are included in snooping. The interface allows one level of pipelining, that is, 
there can be two outstanding reads and writes at any given time. Note that these must be 
unlike operations; for example, there cannot be two outstanding explicit load operations, 
but there can be a load and an instruction fetch. Accesses are prioritized. The operation of 
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the memory unit is described in Section 9.1.2, “Operation of the Memory Unit for Loads 
and Stores.” 


Figure 9-1 shows the address path from the execution units and instruction fetcher, through 
the translation logic to the cache and system interface logic. 


The MPC601 uses separate address and data buses and a variety of control and status 
signals for performing reads and writes. The address bus is 32 bits wide and the data bus is 
64 bits wide. The interface is synchronous—all timing is derived from the equivalent of the 
rising edge of the bus clock cycle. All MPC6()1 inputs are sampled at and all outputs are 
driven from this edge. The bus can run at the full processor-clock frequency or at an integer 
division of the processor-clock speed. The MPC601 provides a TTL-compatible interface. 


9.1.1 Operation of the On-Chip Cache 


The MPC601's cache is a combined instruction and data (or unified) cache. It is a 
physically-addressed, virtually-indexed, 32-Kbyte cache with eight-way set associativity. 
The cache consists of eight sets of 128 sectors. Each 16-word cache line consists of two 
8-word sectors. Both sectors share the same line address tag. Cache coherency, however, is 
maintained for each sector, so there are separate coherency state bits for each sector. If one 
sector of the line is filled from memory, the MPC601 attempts to load the other sector as a 
low-priority bus operation. There is no guarantee that the other sector will be loaded. 


Because the cache on the MPC601 is an on-chip, write-back primary cache, the 
predominant type of transaction for most applications is burst-read memory operations, 
followed by burst-write memory operations, I/O controller interface operations, and single- 
beat (noncacheable or write-through) memory read and write operations. Additionally, 
there can be address-only operations, variants of the burst and single-beat operations 
(global memory operations that are snooped, and atomic memory operations, for example), 
and address retry activity (for example, when a snooped read access hits a modified line in 
the cache). 


The cache has one address port dedicated to instruction fetch and load/store accesses and 
one dedicated to snooping transactions on the system interface. Therefore, snooping does 
not require additional clock cycles unless a snoop hit that requires a cache status update 
occurs. 
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Figure 9-1. MPC601 Processor Block Diagram 
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9.1.2 Operation of the Memory Unit for Loads and Stores 


As shown in Figure 9-1, the memory unit includes two read-queue elements and three 
write-queue elements. The read queue buffers are used for holding addresses for read 
operations; the write queue buffers are used for holding addresses and data for write 
operations and to support such features as address pipelining, snooping, and write 
buffering, described as follows: 


* The two read-queue elements allow the system interface logic to buffer as many as 
two outstanding read operations. There are two restrictions that apply to filling the 
two read-queue elements described as follows: 


— There cannot be two outstanding load operations. 
— There cannot be two outstanding read-with-intent-to-modify instructions. 


* Note that when a read miss causes the cache to be updated, only the sector with the 
required data is guaranteed to be updated. The other sector can be updated only if 
both read-queue elements are free. The update of the other sector can be disabled by 
setting bit 26 in the HIDO register (HID[DRF)). 


* Two of the three write-queue elements, marked “A” and “B” in Figure 9-1, are 
buffers for write operations. They buffer store operations and sectors that are written 
back to memory such as when a cache location is updated after a cache miss. This 
allows the cache to be updated before the replaced sector is written back to system 
memory. 


¢ The third queue element, marked “snoop” in Figure 9-1, is provided to support high- 
priority copy-back operations that result from snoop hits to modified data (cache- 
sector snoop push-out operations while a read operation is pending on the bus). 
Snoop hits to modified data create a high-priority store operation that allows the 
processor to become bus master to store the modified data to memory, where it in 
turn is read by the snooping device. 


The data bus supports one level of pipelining. 


9.1.3 Operation of the System Interface 


Memory accesses can occur in single-beat and four-beat burst data transfers. The address 
and data buses are independent for memory accesses to support pipelining and split 
transactions. The MPC601 can pipeline as many as two transactions and has limited support 
for out-of-order split-bus transactions. 


Memory is accessed through an arbitration mechanism that allows devices to compete for 
bus mastership. This arbitration mechanism is flexible, allowing the MPC60I to be 
integrated into systems that implement various fairness and bus-parking procedures to 
avoid arbitration overhead. Additional multiprocessor support is provided through 
coherency mechanisms that provide snooping, external control of the on-chip cache and 
TLB, and support for a secondary cache. Multiprocessor software support is provided 
through the use of atomic memory operations. 
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Typically, memory accesses are weakly ordered—sequences of operations, including 
load/store string and multiple instructions, do not necessarily complete in the order they 
begin—maximizing the efficiency of the bus without sacrificing coherency of the data. The 
MPC6(0)1 allows read operations to precede store operations (except when a dependency 
exists, of course). In addition, the MPC601 may reorder high priority store operations 
ahead of lower priority store operations. Because the processor can dynamically optimize 
run-time ordering of load/store traffic, overall performance is improved. 


Note that the Synchronize (sync) or Enforce In-Order Execution of I/O (eieio) instruction 
can be used to enforce strong ordering. 


The following sections describe how the MPC601 interface operates, providing detailed 
timing diagrams that illustrate how the signals interact. A collection of more general timing 
diagrams are included as examples of typical bus operations. 


Figure 9-2 is a legend of the conventions used in the timing diagrams. 


This is a synchronous interface—all MPC601 input signals except the PCLK_EN signals 
are sampled relative to the rising edge of the bus clock cycle. Outputs are driven off the 
same rising edge of bus clock cycle (see the electrical specifications for exact timing 
information). 


9.1.4 I/O Controller Interface Accesses 
Memory and I/O controller interface accesses use the MPC601 signals differently. 


The MPC601 defines separate memory and I/O address spaces, or segments, distinguished 
by the segment register T-bit in the address translation logic of the MPC601. If the T-bit is 
cleared, the memory reference is a normal memory access and can use the virtual memory 
management hardware of the MPC601. If the T-bit is set, the memory reference is an I/O 
controller interface access. 


The function and timing of some address transfer and attribute signals (such as TTO-TT3, 
TBST, and TSIZO-TSIZ2) are changed for I/O controller interface accesses. Additional 
controls are required to facilitate transfers between the MPC6()1 and intelligent I/O devices. 
1/O controller interface and memory transfers are distinguished from one another by their 
address transfer start signals—TS indicates that a memory transfer is starting and XATS 
indicates that an I/O controller interface transaction is starting. 


Unlike memory accesses, I/O controller interface accesses cannot be pipelined and must be 
strongly ordered—each access occurs in strict program order and completes before another 
access can begin. For this reason, i/O controlier interface accesses are Icss efficient than 
memory accesses. The I/O extensions also allow for additional bus pacing and multiple 
transaction operations for variably-sized data transfers (1 to 128 bytes), and they support a 
tagged, split request/response protocol. The I/O controller interface access protocol also 
requires the slave device to function as a bus master. 
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Bar over signal name indicates active low 
ap0 MPC601 input (while MPC601 is a bus master) 
Br MPC601 output (while MPC601 is a bus master) 
ADDR+ MPC601 output (grouped: here, address plus attributes) 


qual 8G_ — MPC601 internal signal (inaccessible to the user, but 
used in diagrams to clarify operations) 


Compelling dependency—event will occur on the 
next clock cycle 


Prerequisite dependency—event will occur on an 
undetermined subsequent clock cycle 


(_) _ MPC601 three-state output 





MPC601 non-sampled input 


\ / Signal with sample point 
~ 


A sampled condition (dot on high or low state) 
° with multiple dependencies 


"s / Timing for a signal had it been asserted (it is not 
ets calla actually asserted) 





Figure 9-2. Timing Diagram Legend 


9.2 Memory Access Protocol 


Memory accesses are divided into address and data tenures. Each tenure has three phases— 
bus arbitration, transfer, and termination. The MPC6()1 also supports address-only 
transactions. Note that address and data tenures can overlap, as shown in Figure 9-3. 


Figure 9-3 shows that the address and data tenures are distinct from one another and that 
both consist of three phases—arbitration, transfer, and termination. Having independent 
address and data tenures allows address pipelining (indicated in Figure 9-3 by fact that the 
data tenure begins before the address tenure ends) and split-bus transactions to be 
implemented at the system level in multiprocessor systems. Figure 9-3 shows a data 
transfer that consists of a single-beat transfer of as many as 64 bits. Four-beat burst transfers 
of 32-byte cache sectors require data transfer termination signals for each beat of data. 
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ADDRESS TENURE 





ARBITRATION | TRANSFER | TERMINATION 


INDEPENDENT ADDRESS AND DATA 


he DATA TENURE 


ARBITRATION} SINGLE-BEAT TRANSFER | TERMINATION 


Figure 9-3. Overlapping Tenures on the MPC601 Bus for a Single-Beat Transfer 





The basic functions of the address and data tenures are as follows: 
e Address tenure 


— Arbitration: During arbitration, address bus arbitration signals are used to gain 
mastership of the address bus. 


— Transfer: After the MPC601 is the address bus master, it transfers the address 
on the address bus. The address signals and the transfer attribute signals control 
the address transfer. The address parity and address parity error signals ensure 
the integrity of the address transfer. 


— Termination: After the address transfer, the system signals that the address 
tenure is complete or that it must be repeated. 


« Data tenure 


— Arbitration: To begin the data tenure, the MPC601 arbitrates for mastership of 
the data bus. 


— Transfer: After the MPC601 is the data bus master, it samples the data bus for 
read operations or drives the data bus for write operations. The data parity and 
data parity error signals ensure the integrity of the data transfer. 


— Termination: Data termination signals are required after each data beat in a data 
transfer. Note that in a single-beat transaction, the data termination signals also 
indicate the end of the tenure, while in burst accesses, the data termination 
signals apply to individual beats and indicate the end of the tenure only after the 
final data beat. 


The MPC601 bus supports address-only transfers, which use only the address bus, with no 
data transfer involved. This is useful in multiprocessor environments where external 
control of on-chip primary caches and TLB entries is desirable. Additionally, the MPC601’s 
retry capability provides an efficient snooping protocol for systems with multiple memory 
systems (including caches) that must remain coherent. 
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9.2.1 Arbitration Signals 


Arbitration for both address and data bus mastership in a multiprocessor system is 
performed by a central, external arbiter and, minimally, by the arbitration signals shown in 
Section 8.2.1, “Address Bus Arbitration Signal”. Most arbiter implementations require 
additional signals to coordinate bus master/slave/snooping activities. Note that address bus 
busy (ABB) and data bus busy (DBB) are bidirectional signals. These signals are inputs 
unless the MPC601 has mastership of one or both of the respective buses; they must be 
connected high through pull-up resistors so that they remain negated when no devices have 
control of the buses. 


The following list describes the address arbitration signals: 


¢ BR (bus request)—Assertion indicates that the MPC601 is requesting mastership 
of the address bus. 


¢ BG (bus grant)—Assertion indicates that the MPC601 may, with the proper 
qualification, assume mastership of the address bus. A qualified bus grant occurs 
when BG is asserted and ABB and ARTRY are negated. 


If the MPC601 is parked, BR need not be asserted for the qualified bus grant. 


¢ ABB (address bus busy)— Assertion indicates that the MPC601 is the address bus 
master. 


The following list describes the data arbitration signals: 


¢ DBG (data bus grant)—Indicates that the MPC601 may, with the proper 
qualification, assume mastership of the data bus. A qualified data bus grant occurs 
when DBG is asserted while DBB, DRTRY, and ARTRY are negated. 


DBB signal is driven by the current bus master, DRTRY is only driven from the bus, 
and ARTRY is from the bus, but only for the address bus tenure associated with the 
current data bus tenure (that is, not from another address tenure). 


¢« DBWO (data bus write only)—Assertion indicates that the MPC601 may run the 
data bus tenure for an outstanding write address even if a read address is pipelined 
before the write address. If DBWO is asserted, the MPC601 only assumes data bus 
mastership for a pending data bus write operation (that is, the MPC601 does not take 
the data bus for a pending read operation if this input is asserted along with DBG). 
Care must be taken with DBWO to ensure the desired write is queued (for example, 
a cache-sector snoop push-out operation). 


¢ DBB (data bus busy)—Assertion indicates that the MPC601 is the data bus master. 
The MPC601 always assumes data bus mastership if it needs the data bus and is 
given a qualified data bus grant (see DBG). 


For more detailed information on the arbitration signals, refer to Section 8.2.1, 
“Address Bus Arbitration Signal,” and Section 8.2.6, “Data Bus Arbitration 
Signals.” 
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9.2.2 Address Pipelining and Split-Bus Transactions 


The MPC6()1 protocol provides independent address and data bus capability to support 
pipelined and split-bus transaction system organizations. Address pipelining allows new 
bus transactions to begin before the current transaction has finished by overlapping the data 
bus tenure associated with a previous address bus tenure with one or more successive 
address tenures. Split-bus transaction capability allows the address bus and data bus to have 
different masters at the same time. 


While this capability does not inherently reduce memory latency, support for address 
pipelining and split-bus transactions can greatly improve effective bus/memory 
throughput. For this reason, these techniques are most effective in shared-memory 
multiprocessor implementations where bus bandwidth is an important measurement of 
system performance. 


External arbitration is required in systems in which multiple devices must compete for the 
system bus. The design of the external arbiter affects pipelining by regulating address bus 
grant (BG), data bus grant (DBG), and AACK signals. For example, a one-level pipeline is 
enabled by asserting AACK to the current address bus master and granting mastership of 
the address bus to the next requesting master before the current data bus tenure has 
completed. Two address tenures can occur before the current data bus tenure completes. 


The MPC601 can pipeline its own transactions to a depth of one level (intraprocessor 
pipelining); however, the MPC601 bus protocol does not constrain the maximum number 
of levels of pipelining that can occur on the bus between multiple masters (interprocessor 
pipelining). The external arbiter must control the pipeline depth and synchronization 
between masters and slaves. 


In a pipelined implementation, data bus tenures are kept in strict order with respect to 
address tenures. However, external hardware can further decouple the address and data 
buses, allowing the data tenures to occur out of order with respect to the address tenures. 
This requires some form of system tag to associate the out-of-order data transaction with 
the proper originating address transaction (not defined for the MPC601 interface). 
Individual bus requests and data bus grants from each processor can be used by the system 
to implement tags to support interprocessor, out-of-order transactions. 


The MPC601 supports a limited intraprocessor out-of-order, split-transaction capability via 
the data bus write only (DBWO) signal. For more information about using DBWO, see 
Section 9.10, “Using DBWO (Data Bus Write Only).” 


S @ AAttane an Baan Ae 

9.5 AGaress BUS Tenure 

This section describes the three phases of the address tenure—address bus arbitration, 
address transfer, and address termination. 
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9.3.1 Address Bus Arbitration 


When the MPC601 needs access to the external bus and it is not parked (BG is negated), it 
asserts bus request (BR) until it is granted mastership of the bus and the bus is available 
(see Figure 9-4). The external arbiter must grant master-elect status to the potential master 
by asserting the bus grant (BG) signal. The MPC601 requesting the bus determines that the 
bus is available when the ABB input is negated. When the address bus is not busy (ABB 
input is negated), BG is asserted, and the address retry (ARTRY) input is negated. This is 
referred to as a qualified bus grant. The potential master assumes address bus mastership 
when it receives a qualified bus grant by asserting ABB. 


The MPC601 also provides an internally generated address bus busy signal, which it 
logically ORs with the ABB signal received off of the bus. This internal address bus busy 
signal is asserted with any TS or XATS signal and is negated with a valid AACK. Tis 
internally generated address bus busy signal is useful in systems that do not use ABB. 


| “1 | 0 | 1 | 
Logical Bus Clock | | | | | | | 
need_bus | 


G / 
a2 
3 





Figure 9-4. Address Bus Arbitration 


External arbiters must allow only one device at a time to be address bus master. In 
implementations in which no other device can be a master, BG can be grounded (always 
asserted) to continually grant mastership of the address bus to the MPC601. 


If the MPC601 asserts BR before the external arbiter asserts BG, the MPC601 is considered 
to be unparked, as shown in Figure 9-4, Figure 9-5 shows the parked case, where a qualified 
bus grant exists on the clock edge following a need_bus condition. Notice that the bus clock 
cycle required for arbitration is eliminated if the MPC601 is parked, reducing overall 
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memory latency for a transaction. The MPC601 always negates ABB for at least one bus 
clock cycle after AACK is asserted, even if it is parked and has another transaction pending. 


Typically, bus parking is provided to the device that was the most recent bus master; 
however, system designers may choose other schemes, such as providing unrequested bus 
grants in situations where it is easy to correctly predict the next device requesting bus 
mastership. 


2 


need_bu 


a A 








Figure 9-5. Address Bus Arbitration Showing Bus Parking 


When the MPC601 receives a qualified bus grant, it assumes address bus mastership by 
asserting ABB and negating the BR output signal. Meanwhile, the MPC601 drives the 
address for the requested access onto the address bus and asserts TS to indicate the start of 
a new transaction. 


When designing external bus arbitration logic, note that the MPC601 may assert BR 
without using the bus after it receives the qualified bus grant. For example, in a system 
using bus snooping, if the MPC601 asserts BR to perform a replacement copy-back 
operation, another device can invalidate that sector before the MPC601 is granted 
mastership of the bus. Once the MPC6()1 is granted the bus, it no longer needs to perform 
the copy-back operation; therefore, the MPC601 does not assert ABB and does not use the 
bus for the copy-back operation. Note that the MPC601 asserts BR for at least one clock 
cycle in these instances. 
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9.3.2 Address Transfer 


During the address transfer, the physical address and all attributes of the transaction are 
transferred from the bus master to the slave device(s). Snooping logic may monitor the 
transfer to enforce cache coherency (see discussion about snooping in Section 9.3.3, 
“Address Transfer Termination”). The signals used in the address transfer include the 
following signal groups (see Figure 8-1): 


¢ Address transfer start signal: Transfer start (TS) 


Note that extended address transfer start (XATS) is used for I/O controller interface 
operations and has no function for memory accesses. See Section 9.6, “I/O 
Controller Interface Operation.” 


¢ Address transfer signals: Address bus (AQ—A31), address parity (APO—AP3), and 
address parity error (APE) 


¢ Address transfer attribute signals: Transfer type (TTO-TT4), transfer code (TCO-— 
TC3), transfer size (TSIZO-TSIZ2), transfer burst (TBST), cache inhibit (CT), write- 
through (WT), global (GBL), and cache set element (CSEO-CSE2) 


Figure 9-6 shows that the timing for all of these signals, except TS and APE, is identical. 
All of the address transfer and address transfer attribute signals are combined into the 
ADDR+ grouping in Figure 9-6. The TS signal indicates that the MPC601 has begun an 
address transfer and that the address and transfer attributes are valid (within the context of 
a synchronous bus). The MPC601 always asserts TS (or XATS for I/O controller interface 
operations) coincident with ABB in multiprocessor systems. As an input, TS need not 
coincide with the assertion of ABB on the bus (that is, either TS or XATS can be asserted 
with, or on, a subsequent clock cycle after ABB is asserted; the MPC601 tracks this 
transaction correctly). 






ADD Ft } 
aac 


artry_in ! 


Figure 9-6. Address Bus Transfer 
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In Figure 9-6, the address transfer occurs during bus clock cycles 1 and 2 (arbitration 
occurs in bus clock cycle 0) and the address transfer is terminated in bus clock 3). In this 
diagram, the address bus termination input, AACK, is asserted to the MPC601 on the bus 
clock following assertion of TS (as shown by the dependency line). This is the minimum 
duration of the address transfer for the MPC601; the duration can be extended by delaying 
the assertion of AACK for one or more bus clocks. 


9.3.2.1 Address Bus Parity 


The MPC601 always generates one bit of correct odd-byte parity for each of the four bytes 
of address when a valid address is on the bus. The calculated values are placed on the APQ- 
AP3 outputs when the MPC601 is the address bus master. If the MPC601 is not the master 
and TS and GBL are asserted together (qualified condition for snooping memory 
operations), the calculated values are compared with the APOQ-AP3 inputs. If there is no 
match, the APE output is asserted. An address bus parity error causes a checkstop condition 
if the bus parity checkstop source is enabled in HIDO. See Chapter 5, “Exceptions.” 


The internal address parity error signal is further qualified by a valid address condition, 
since the APE signal may be asserted for invalid address bus conditions. APE does not, 
therefore, necessarily represent the state of the internal address parity error signal used to 
generate the machine check exception. 


9.3.2.2 Address Transfer Attribute Signals 


The transfer attribute signals include several encoded signals such as the transfer type 
(TTO-TT4) signals, transfer burst (TBST) signal, transfer size (TSIZO-TSIZ2) signals, and 
transfer code (TCO-TC1) signals. Section 8.2.4, “Address Transfer Attribute Signals,” 
describes the encodings for the address transfer attribute signals. Note that TTO-TT4, 
TBST, and TSIZO-TSIZ2 have alternate functions for I/O controller interface operations 
(see Section 9.6, “I/O Controller Interface Operation).” 


9.3.2.2.1 Transfer Type (TTO-TT4) Signals 


Snooping logic should fully decode the transfer type signals if the GBL signal is asserted. 
Slave devices can use the individual transfer type signals without fully decoding the group. 
The transfer type signals generally have the following individual functions: 


¢ TTO—Special operations: The MPC601 drives this signal to indicate that the access 
is part of an atomic data access sequence. This signal is asserted by the MPC6()1 
whenever a bus transaction occurs in response to a lwarx/stwex. (Load Word and 
Reserve Indexed/Store Word Conditional Indexed) instruction pair (see Chapter 3, 
“Addressing Modes and Instruction Set Summary”), an eciwx or ecowx instruction, 
or for a Translation Look-Aside Buffer Invalidate Entry (tlbie) operation. 


¢ TT1i—Read (/write) operations: The TT1 signal indicates whether the transaction is 
aread (TT1 high) or a write (TT1 low) transaction. This is valid for transactions that 
are not address only 
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¢ TT2—Invalidate operations: When asserted with GBL, the TT2 output signal 
indicates that all other caches in the system should invalidate the cache entry on a 
snoop hit. If the snoop hit is to a modified entry, the sector should be copied back 
before being invalidated. 


¢* TT3—Memory (/address-only) operations: Except for eciwx or ecowx instructions 
(TTO—TT3 encodings 1010 or 1110) the TT3 signal, when asserted, indicates that the 
associated data transfer is to/from memory. External logic can synthesize the data 
bus request from the combination of TS (or XATS) and TT3 (DBR=TS&TT3). If 
TT3 is not asserted with the address, the associated bus transaction is considered to 
be a broadcast operation that all bus participants must honor (or a reserved 
operation). This is an address-only transaction; the MPC601 does not need and will 
not acquire data bus ownership, even if it receives a qualified data bus grant. 
Figure 9-7 shows an address-only transaction. On the rising edge of bus cycle 2, TT3 
is not asserted; therefore, the data bus will not be needed. 


¢ TT4—The TT4 signal is reserved for future expansion. 





Figure 9-7. Address-Only Bus Transaction 


9.3.2.2.2 Transfer Size (TSIZO-TSIZ2) Signals 

The transfer size signals (TSIZO-TSIZ2) indicate the size of the requested data transfer as 
shown in Table 9-1. The TSIZO-TSIZ2 signals may be used along with TBST and A29- 
A31 to determine which portion of the data bus contains valid data for a write transaction 
or which portion of the bus should contain valid data for a read transaction. Note that for a 
burst transaction (as indicated by the assertion of TBST) TSIZO-TSIZ2 are always set to 
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b‘010". Therefore, if the TBST signal is asserted, the memory system should transfer a total 
of eight words (32 bytes), regardless of the TSIZO-TSIZ2 encoding. 


Table 9-1. Transfer Size Signal Encodings 


[reer [re [as [ro [ ter 
a 
ee 
ee 
a 
ee 
a 
ee 
ee 


The basic coherency size of the bus is defined to be 32 bytes (corresponding to one cache 
sector). Data transfers that cross an aligned, 32-byte boundary either must present a new 
address onto the bus at that boundary (for coherency consideration) or must operate as non- 
coherent data with respect to the MPC601. 











9.3.2.3 Effect of Alignment in Data Transfers 

Table 9-2 lists the aligned transfers that can occur on the MPC601 bus. These are transfers 
in which the data is aligned to an address that is an integer multiple of the size of the data. 
For example, Table 9-2 shows that one-byte data is always aligned; however, for a four- 
byte word to be aligned, it must be oriented on an address that is a multiple of four. 


Table 9-2. Aligned Data Transfers 


Data Bus Byte Lane(s) 
Transter size | rsi2o | rs12t | Ts1z2 eee tet 


Riches 
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Table 9-2. Aligned Data Transfers (Continued) 


Data Bus Byte Lane(s) 


Notes: 


V The byte portions of the requested operand that are read or written during that bus transaction. 


— These entries are not required and are ignored during read transactions and are driven with undefined 

data during all write transactions (except non-cacheable write transfers, in which data is mirrored on both 

word lanes if the transfer does not exceed four bytes). 
The MPC601 also supports misaligned memory operations. These transfers address 
memory that is not aligned to the size of the data being transferred (such as, a word read of 
an odd byte address). Although most of these operations hit in the primary cache (or 
generate burst memory operations if they miss), the MPC601 interface supports misaligned 
transfers within a double-word (64-bit aligned) boundary, as shown in Table 9-3. Note that 
the three-byte transfer in Table 9-3 is only one example of misalignment. As long as the 
attempted transfer does not cross a double-word boundary, the MPC601 can transfer the 
data on the misaligned address (for example, a word read from an odd byte-aligned address, 
or a seven-byte read from an odd byte-aligned address). 


An attempt to address data that crosses a double-word boundary requires two bus transfers 
to access the data. This is illustrated in the last example of a three-byte transfer in Table 9-3. 
The transfer requires two accesses—the first for the last two bytes of one double-word 
address, the second for one byte from the next double-word address. The TBST, TSIZO- 
TSIZ2, and A29-A31 signals provide enough information to determine the size of the 
transfer and the data bus byte lanes involved in the misaligned transfer. 


Although misaligned transfers are supported, they may degrade performance substantially. 
In addition to the double-word straddle boundary condition, the address translation logic 
can generate substantial exception overhead when the microcoded, sequenced, load/store 
multiple and load/store string instructions access misaligned data. It is strongly 
recommended that software attempt to align code and data where possible. 
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Table 9-3. Misaligned Data Transfer (Three-Byte Examples) 


Data Bus Byte Lanes 
Transfer Size 
TS!Z(0-2) 


Three bytes 01 


_ 


0 
011 
011 
011 
011 


First transfer: 010 
two bytes 


Second 
transfer: 
one byte 


First transfer: 
one byte 


Second 
transfer: 
two bytes 





A: Byte lane used 
—: Byte lane not used 


9.3.2.4 Transfer Code (TCO-TC1) Signals 

The TCO and TC1 signals provide supplemental information about the corresponding 
address. Note that the TCx signals can be used with the TTO-TT4 and TBST signals to 
further define the current transaction. These encodings may be useful for debugging. 


The meaning of TCO depends on whether the current transaction is a read or write 
operation. On aread operation, TCO asserted indicates that the transaction is an instruction 
fetch operation; otherwise, the read operation is a data operation. On an MPC601 write 
operation, TCO asserted indicates that the write transfer is invalidating the associated sector. 
This is for a copy-back replacement, a Data Cache Block Flush instruction (dcbf), snoop 
that causes invalidation (for example a flush or kill). TCO negated indicates the write is not 
invalidating any cache sector (for example, a replacement sector copy-back, write-through, 
or cache-inhibited write operation.) 


The TC1 signal is asserted on read and RWITM operations to indicate that a low-priority 
operation to load the sector adjacent to one that was previously loaded due to a cache miss 
is queued; therefore, the next bus transaction will likely access the same page of memory. 
This operation may not be the next transaction if, for instance, a copy-back operation that 
resulted from a snoop hit is required. Note that TC1 asserted indicates to the memory 
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system the likelihood that the next access is on the same page, but it does not guarantee this 
will occur because of transfer priorities and the bus traffic/code execution dynamics. TC] 
is negated for all write operations on the bus. 


Table 9-4 shows the encodings of the TCO and TC1 signals. 


Table 9-4. Transfer Code Signal Encodings 












Bus operation is an instruction fetch 
Write: Operation is invalidating the cache line in the MPC601. 
Kill (address only): Operation fs invalidating the cache line in the MPC601. 


. 
Negated 


TC1 Asserted The next access is likely to be on same page. Asector has been loaded, anda 
low-priority load of the adjacent sector is queued. 
Negated The next access is not likely to be on the next page; an optional low-priority 
load of an adjacent sector is not queued. 


9.3.3 Address Transfer Termination 


The MPC601 does not terminate the address transfer until the AACK (address 
acknowledge) input is asserted; therefore, the system can extend the address transfer phase 
by delaying the assertion of AACK to the MPC601. Although AACK can be asserted as 
early as the bus clock cycle following TS (see Figure 9-8), MPC601 address transfers 
require a minimum of three bus clock cycles—enough time to negate and tristate the shared 
ARTRY and SHD signals with no contention between devices. As shown in Figure 9-8, 
these signals are asserted for one bus clock cycle, tristated for the next bus clock cycle, 
driven high for the next 2X_PCLK cycle time, and finally tristated. Note that AACK is 
asserted for only one bus clock cycle. 


Bus operation is a data read 
Write: Operation is not invalidating the cache line in the MPC601. 
Kill (address only): Operation is not invalidating the cache line. 








Note that precharging of the ARTRY and SHD signals during the negation period can be 
disabled by enabling HID[29]. After ARTRY and SHD are negated, they will be three- 
stated for two bus cycles and the system is responsible for precharging both ARTRY and 
SHD signals. This allows masters in a system that uses both 3.6-V and 5-V levels to use the 
same system bus. 


The address transfer can be terminated with the requirement to retry if ARTRY is asserted 
during the bus clock cycle following AACK. If ARTRY is asserted in this window, the 
MPC601 negates BR in the following bus clock cycle; after that, it attempts to retry the 
address transfer. By delaying the bus request by one bus clock cycle, the protocol provides 
an opportunity for the snooping device that asserted the ARTRY to access the bus next, and 
therefore retry determinacy is possible. In order for the retry determinacy to be guaranteed, 
however, the external bus arbitration logic must ensure that the snooping device has access 
to the bus next. 





9-18 PowerPC 601 RISC Microprocessor User's Manual MOTOROLA 


The only valid window for the ARTRY input is the one bus clock cycle following the 
assertion of AACK. Snooping devices must monitor the assertion of AACK to know when 
to deassert/tristate ARTRY, as shown in Figure 9-8. The assertion of ARTRY/SHD can be 
derived in one of the following ways: 


¢ ARTRY/SHD can be asserted on the second clock after TS is asserted 


¢ ARTRY/SHD can be asserted before AACK is asserted, but is not qualified by the 
master MPC601 until the clock after AACK is asserted 





aac 
SHD : 


qualiG : 





Figure 9-8. Snooped Address Cycle with ARTRY 


The MPC601 requires that the first (or only) TA not be asserted before AACK (note that 
TA can be held off directly by the slave device delaying AACK assertion or indirectly by 
an external arbiter delaying DBG assertion). This requirement guarantees the relationship 
between TA and ARTRY/SHD such that, in the case of an address retry, the MPC601 can 
purge the data/instructions from its data path queues and waive off the data/instructions 
before they are forwarded to the cache/CPU. 


When the data tenure begins before the address tenure is complete, if the MPC6()1 has 
asserted DBB, assertion of ARTRY causes the MPC601 to terminate the data bus 
transaction and retry both the address and data tenures later. If the transfer is a single-beat 
transfer and TA occurs as early as the AACK window, there is no indication of an early data 
bus termination. However, if a burst transaction is in progress, the MPC60)1 negates DBB 
early in response to ARTRY. The system logic does not need to assert TA for four bus clock 
cycles in this case. 
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If DBG is not asserted until the ARTRY window and ARTRY is asserted, the MPC601 does 
not become data bus master. Note that some system designs, such as single-master systems, 
do not require the use of ARTRY. 


For information about ARTRY scenarios, see Section 9.3.3.1, “Address Retry Sources.” 
For information about MESI protocol and its effect on address tenure termination, refer to 
Section 9.4.4, “Memory Coherency—MESI Protocol.” 


9.3.3.1 Address Retry Sources 


The assertion of the SHD and ARTRY input signals provide sufficient information for the 
appropriate handling of cache sector coherency. They encode information about a 
transaction, as shown in Table 9-5. 


Table 9-5. Address Retry Causes 


a 
High impedance High impedance Exclusive. No snoop hit. Pipeline not busy. 


Negated Pipeline busy. Queuing retry 
Asserted Snoop hit (shared) 


If the SHD and ARTRY inputs are not asserted for a cache-sector fill operation, the sector 
is marked as exclusive (see Section 9.4.4, “Memory Coherency—MESI Protocol”). If the 
SHD input is asserted without ARTRY, the sector is marked as shared. 





NOTE: If the invalidate (TT2) input signal is asserted for the transaction, the sector is 
marked exclusive regardless of the state of the SHD signal. If ARTRY is asserted without 
SHD, a device cannot service the address transaction currently (because of queuing 
constraints) and the transaction is retried later. The MPC6(01 reacts to the assertion of 
ARTRY the same way, regardless of the state of SHD. The timing of the SHD input is the 
same as the timing for ARTRY. 


One or more devices can indicate a queuing retry condition by asserting ARTRY while one 
or more devices separately indicate the snoop-hit shared condition by asserting SHD. This 
condition appears as a snoop hit modified condition on the bus, since both SHD and 
ARTRY are asserted. This is not a problem for the MPC601 since ARTRY is not qualified 
by SHD (that is, SHD is a don't care if ARTRY is asserted to the MPC601). 


9.4 Data Bus Tenure 


This section describes the data bus arbitration, transfer, and termination phases defined by 
the MPC60)1 memory access protocol. The phases of the data tenure are identical to those 
of the address tenure, underscoring the symmetry in the control of the two buses. 
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9.4.1 Data Bus Arbitration 


Data bus arbitration uses the data arbitration signal group, that is, DBG, DBWO, and DBB. 
Additionally, the combination of TS or XATS and TT3 (address-only signal) function as a 
data bus request. 


The TS signal is an implied data bus request from the MPC601; the arbiter must qualify TS 
with the transfer type (TT) encodings to determine if the current address transfer is an 
address-only operation, which does not require a data bus transfer (see Figure 9-7). If the 
data bus is needed, the arbiter grants data bus mastership by asserting the DBG input to the 
MPC601. As with the address-bus arbitration phase, the MPC6(0)1 must qualify the DBG 
input with a number of input signals before assuming bus mastership, as shown in 
Figure 9-9, 





Figure 9-9. Data Bus Arbitration 


A qualified data bus grant can be expressed as the following: 
QDBG = DBG asserted while DBB, DRTRY, and ARTRY are negated 


When a data tenure overlaps with its associated address tenure, a qualified ARTRY 
assertion coincident with a data bus grant does not result in data bus mastership (DBB is 
not asserted). Otherwise, the MPC601 always asserts DBB on the bus clock cycle after 
recognition of a qualified data bus grant. Since the MPC601 can pipeline transactions, there 
may be an outstanding data bus transaction when a new address transaction is retried. In 
this case, the MPC6()1 becomes the data bus master to complete the previous transaction. 
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9.4.1.1 Using the DBB Signal 


The DBB signal should be connected between masters only if data tenure hand-off is left 
to the masters. The memory system can control data hand-off directly with DBG. 


The MPC601 asserts DBB throughout the data transaction; however, the MPC60)1 does not 
park the data bus and assert DBB across multiple transactions. DBB is negated on the bus 
clock cycle after a final TA is received from the bus. 


9.4.2 Data Transfer 


The data transfer signals include DHO-DH31, DLO-DL31, DPO-DP7 and DPE. For 
memory accesses, the DH and DL signals form a 64-bit data path for read and write 
operations. 


The MPC601 transfers data in either single- or four-beat burst transfers. Single-beat 
operations can transfer from one to eight bytes at a time and can be misaligned (see 
Section 9.3.2.3, “Effect of Alignment in Data Transfers”). Burst operations always transfer 
eight words and are aligned to four- or eight-word address boundaries. Burst transfers can 
achieve significantly higher bus throughput than single-beat operations. 


The type of transaction initiated by the MPC601 depends on whether the code or data is 
cacheable and, for store operations, whether the cache is operated in write-back or write- 
through mode which the MMU controls at either the page or block basis. Burst transfers 
support cacheable operations only; that is, memory structures must be marked as cacheable 
(and write-back for data store operations) in the respective TLB entry to take advantage of 
burst transfers. 


The MPC601 output TBST indicates to the system whether the current transaction is a 
single- or four-beat transfer. A burst transfer has an assumed address order. For load or store 
operations that miss in the cache (and are marked as cacheable and, for stores, write-back 
in the MMU), the MPC6(0)1 presents the quad-word—aligned address associated with the 
critical code or data that initiated the transaction. This minimizes latency by allowing the 
critical code or data to be forwarded to the processor before the rest of the sector is filled. 
For all other burst operations, however, the sector is transferred beginning with the oct- 
word aligned data. Note that this difference can complicate cache-to-cache 
implementations. 


The MPC601 does not directly support interfacing to subsystems with less than a 64-bit 
data path (except for I/O controller interface operations, which are discussed in Section 9.6, 
“1/O Controller Interface Operation’). However, the MPC6()1 duplicates, or mirrors, the 
transfer data on the unused word lane, for store operations to pages marked as non- 
cacheable. This means, for example, that for a non-cacheable byte store operation, the valid 
byte is present on two byte lanes—one in the upper word and one in the lower word. For a 
word store operation, the word is mirrored across both word lanes. Unused byte lanes are 
undefined. 
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The data is not mirrored, however, for other store operations (including write-through). A 
cache hit causes the double word of data containing the data being transferred to be output 
on the data bus lanes. 


CAUTION 


While this information may be useful to some applications that 
do not cache data structures, data mirroring may not be 
supported on future versions of the MPC601 or other PowerPC 
processors. 


9.4.3 Data Transfer Termination 


Four signals are used to terminate data bus transactions: TA, DRTRY (data retry), TEA 
(transfer error acknowledge), and in some cases ARTRY. The TA signal indicates normal 
termination of data transactions. DRTRY indicates invalid read data in the previous bus 
clock cycle. TEA indicates a non-recoverable bus error event. ARTRY can also terminate 
a data bus transaction, only if it occurs before the first assertion of TA. 


9.4.3.1 Normal Single-Beat Termination 
Normal termination of a single-beat data read operation occurs when TA is asserted by a 


responding slave. The TEA and DRTRY signals must remain negated during the transfer 
(see Figure 9-10). 





Figure 9-10. Normal Single-Beat Read Termination 


Normal termination of a single-beat data write transaction occurs when TA is asserted by 
a responding slave. TEA must remain negated during the transfer. The DRTRY signal is 
not sampled during data writes, as shown in Figure 9-11. As shown in both Figure 9-10 and 
Figure 9-11, the TT1 signal driven low by the MPC60)1 indicates a write is in progress. 
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Figure 9-11. Normal Single-Beat Write Termination 


Normal termination of a burst transfer occurs when TA is asserted during four bus clock 
cycles, as shown in Figure 9-12. The bus clock cycles need not be consecutive, thus 
allowing pacing of the data transfer beats. For read bursts to terminate successfully, TEA 
and DRTRY must remain negated during the transfer. For write bursts, TEA must remain 
negated during the transfer. DRTRY is ignored during data writes. 





Figure 9-12. Normal Burst Transaction 


For read bursts, DRTRY may be asserted one bus clock cycle after TA is asserted to signal 
that the data presented with TA is invalid and that the processor must wait for the negation 
of DRTRY before forwarding data to the processor (see Figure 9-13). Thus, a data beat can 
be speculatively terminated with TA and then one bus clock cycle later confirmed with the 
negation of DRTRY. The DRTRY signal is valid only for read transactions. TA must be 
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asserted on the bus clock cycle before the first bus clock cycle of the assertion of DRTRY; 
otherwise the results are undefined. 


The DRTRY signal extends data bus mastership such that other processors cannot use the 
data bus until DRTRY is negated. Therefore, in the example in Figure 9-13, DBB cannot 
be asserted until bus clock cycle 5. This is true for both read and write operations even 
though DRTRY does not hold the master on write operations. 





Figure 9-13. Termination with DRTRY 


Figure 9-14 shows the effect of using DRTRY during a burst read. It also shows the effect 
of using TA to pace the data transfer rate. Notice that in bus clock cycle 3 of Figure 9-14, 
TA is negated for the second data beat. The MPC6()1 data pipeline does not proceed until 
bus clock cycle 4 when the TA is reasserted. 


Note that DRTRY is useful for systems that implement speculative forwarding of data such 
as those with direct-mapped, second-level caches where hit/miss is determined on the 
following bus clock cycle, or for parity- or ECC-checked memory systems. 


Note that DRTRY may not be implemented on other PowerPC processors. 


9.4.3.2 Data Transfer Termination Due to a Bus Error 

The TEA signal indicates that a bus error occurred. It may be asserted while DBB (and/or 
DRTRY for read operations) is asserted. Asserting TEA to the MPC6()1 terminates the 
transaction; that is, further assertions of TA and DRTRY are ignored and DBB is negated. 
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Figure 9-14. Read Burst with TA Wait States and DRTRY 


Assertion of the TEA signal causes a machine-check exception (and possibly a check-stop 
condition within the MPC601). For more information, see Section 5.4.2, “Machine Check 
Exception (x’00200’).” However assertion of TEA does not invalidate data entering the 
GPR or the cache; therefore, the MPC601 may act on invalid code/data (although the 
exception will eventually be recognized, if enabled). Additionally, the corresponding 
address of the access that caused TEA to be asserted is not latched by the MPC601. To 
recover from this condition, the MPC6()1 must be reset; therefore, this function should only 
be used to flag fatal system conditions to the processor (such as parity or uncorrectable ECC 
errors). 


After the MPC601 has committed to run a transaction, that transaction must eventually 
complete. The separate address and data bus grants and address retry cause the transaction 
to be restarted; TA wait states and DRTRY assertion for reads delay termination of 
individual data beats. Eventually, however, the system must either terminate the transaction 
or assert the TEA signal to put the MPC601 into checkstop mode. For this reason, care must 
be taken to check for the end of physical memory and the location of certain system 
facilities. 


Note that TEA generates a machine-check exception depending on the ME bit in the MSR. 
Setting the checkstop enable control bits properly leads to a true checkstop condition. 


Note also that the MPC601 does not implement a synchronous error capability for memory 
accesses (see Section 9.6, “I/O Controller Interface Operation”). This means that the 
exception instruction pointer does not point to the memory operation that caused the 
assertion of TEA, but to the instruction about to be executed (perhaps several instructions 
later). 
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9.4.4 Memory Coherency—MESI Protocol 


The MPC601 provides dedicated hardware to provide memory coherency by snooping bus 
transactions. The address retry capability enforces the four-state, MESI cache-coherency 
protocol (see Figure 9-15). In addition to the hardware required to monitor bus traffic for 
coherency, the MPC60)1 has a cache port dedicated to snooping so that comparing cache 
entries to address traffic on the bus does not tie up the MPC601's on-chip cache. 


The global (GBI) signal output, indicates whether the current transaction must be snooped 
by other snooping devices on the bus. Address bus masters assert GBL to indicate that the 
current transaction is a global access (that is, an access to memory shared by more than one 
processor/cache). If GBL is not asserted for the transaction, that transaction is not snooped. 
When other devices detect the GBL input asserted, they must respond by snooping the 
broadcast address. 


Normally, GBL reflects the M-bit value specified for the memory reference in the 
corresponding translation descriptor(s). Note that care must be taken to minimize the 
number of pages marked as global, because the retry protocol discussed in the previous 
section is used to enforce coherency and can require significant bus bandwidth. 


When the MPC601 is not the address bus master, GBL is an input. The MPC601 snoops a 
transaction if TS and GBL are asserted together in the same bus clock cycle (this is a 
qualified snooping condition). No snoop update to the MPC601 cache occurs if the snooped 
transaction is not marked global. This includes invalidation cycles. 


When the MPC6()1 detects a qualified snoop condition, the address associated with the TS 
is compared against the unified cache tags through a dedicated cache-tag port. Snooping 
completes if no hit is detected. If, however, the address hits in the cache, the MPC60)1 reacts 
according to the MESI protocol shown in Figure 9-15, assuming the WIM bits are set to 
write-back mode, caching allowed, and coherency enforced (WIM = 001). 


Note that, in Figure 9-15, write hits to clean lines of non-global pages do not generate 
invalidate broadcasts. There are several types of bus transactions that involve the 
movement of data that can no longer access the TLB M-bit (for example, replacement 
sector copy-back, snoop push, and table-search operations). In these cases, the hardware 
cannot determine whether the sector was originally marked global; therefore, the MPC601 
marks these transactions as non-global to avoid retry deadlocks. 


The MPC6()1's on-chip cache is implemented as an eight-way set-associative cache. To 
facilitate external monitoring of the internal cache tags, the cache set element (CSE0- 
CSE2) signals indicate which sector of the cache set is being replaced on read operations 
(including RWITM). Note that these signals are valid only for MPC601 burst operations; 
for all other bus operations, the CSE signals should be ignored. Table 9-6 shows the CSE 
encodings. 
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Figure 9-15. MESI Cache Coherency Protocol—State Diagram (WIM = 001) 


Table 9-6. CSE(0-2) Signals 


CSE0-CSE2 Cache Set Element 
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9.5 Timing Examples 


This section shows timing diagrams for various scenarios. For information about 
conventions used in these diagrams, refer to Figure 9-2. 


Figure 9-16 illustrates the fastest single-beat reads. Note that all bidirectional signals go to 
high-impedance between bus tenures. 
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Figure 9-16. Fastest Single-Beat Reads 
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Figure 9-17 illustrates the fastest single-beat writes. Note that all bidirectional signals go to 
high-impedance between bus tenures. TTO-TT3 are binary encoded b’x001’. TTO can be 
either 0 or 1, TT1 and TT2 are 0, and TT3 is 1. 
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Figure 9-17. Fastest Single-Beat Writes 


Figure 9-18 shows three ways to delay single-beat reads showing data-delay controls: 
¢ The TA hold-off can be used to insert wait states in clock cycles 3 and 4. 
¢ For the second access, DBG could have been asserted in clock cycle 6. 
¢ Inthe third access, DRTRY is asserted in clock cycle 11 to flush the previous data. 
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Note that all bidirectional signals go to high-impedance between bus tenures. Note also that 
two loads cannot be pipelined. The pipelining shown in Figure 9-17 can occur if the second 
access is not another load, (for example, an instruction fetch). 
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Figure 9-18. Single-Beat Reads Showing Data-Delay Controls 
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Figure 9-19 shows data-delay controls in a single-beat write. Note that all bidirectional 
signals are set to high impedance between bus tenures. Data transfers are delayed in the two 
following ways: 

* The TA holdoff is used to insert wait states in clocks 3 and 4. 

¢ Inclock 6, DBG is held negated, delaying the start of the data tenure. 


The last access is not delayed (DRTRY is valid only for read operations). 
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Figure 9-19. Single-Beat Writes Showing Data Delay Controls 
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Figure 9-20) shows three single-beat transfers back-to-back. Note that all bidirectional 
signals are set at high-impedance state between tenures. 
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Figure 9-20. Back-to-Back Single-Beat Transfers 
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Figure 9-21 shows the use of data-delay controls with burst transfers. Note that all 
bidirectional signals are set to high impedance between bus tenures. Note the following: 

¢ The first data beat of bursted read data (clock 0) is the critical quad word. 

* The write burst shows the use of TA holdoff on the third data beat. 

¢ The final read burst shows the use of DRTRY on the third data beat. 

¢ The address for the third transfer is held off until the first transfer completes. 
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Figure 9-21. Burst Transfers with Data Delay Controls 
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Figure 9-22 shows the use of the TEA signal. Note that all bidirectional signals are set to 
high impedance between bus tenures. Note the following: 


¢ The first data beat of the read burst (in clock (0) is the critical quad-word. 
¢ The TEA signal truncates the burst write transfer on the third data beat. 
¢ The MPC601 eventually interrupts on the TEA event. 
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Figure 9-22. Use of Transfer Error Acknowledge (TEA) 


9.6 I/O Controller Interface Operation 


The MPC6()1 defines separate memory and I/O address spaces, or segments, distinguished 
by the segment register T-bit in the address translation logic of the MPC601. If the T-bit is 
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cleared, the memory reference is a normal memory access and can use the virtual memory 
management hardware of the MPC6()1. If the T-bit is set, the memory reference is an I/O 
controller interface access. 


There are several architectural ramifications of I/O controller interface accesses, such as the 
following: 


1/O controller interface accesses must be strongly ordered; for example, these 
accesses must run on the bus strictly in order with respect to the instruction stream. 


I/O controller interface accesses must provide synchronous error reporting. 
Chapter 4, “Cache and Memory Unit Operation,” describes architectural aspects of 
I/O controller interface segments, as well as an overview of the PowerPC's 
segmented address space management. 


The MPC601 defines two types of I/O controller interface segments (segment register T-bit 
set) based on the value of the bus unit ID (BUID). See Section 9.6.2, “I/O Controller 
Interface Transaction Protocol Details,” for more information about the BUID. 


I/O controller interface (BUID # x'07F')—I/O controller interface accesses include 
all transactions between the MPC601 and subsystems (referred to as bus unit 
controllers (BUCs) mapped through I/O controller interface address space). 


Memory-forced I/O controller interface (BUID = x'07F')—Memory-forced I/O 
controller interface operations access memory space. They do not use the extensions 
to the memory protocol described for I/O controller interface accesses, and they 
bypass the page- and block-translation and protection mechanisms. The physical 
address is found by concatenating bits 28-31 of the respective segment register with 
bits 4-31 of the effective address. This address is marked as non-cacheable, write- 
through, and global. 


Because memory-forced I/O controller interface accesses address memory space, 
they are subject to the same coherency control as other memory reference 
operations. More generally, accesses to memory-forced I/O controller interface 
segments are considered to be cache-inhibited, write-through and memory-coherent 
operations with respect to the MPC6()1 cache and bus interface. 


The MPC601 has a single bus interface to support accesses to both memory accesses and 
I/O controller interface segment accesses. 


9-36 
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The system recognizes the assertion of the TS signal as the start of a memory access. The 
assertion of XATS indicates an I/O controller interface access. This allows memory devices 
to ignore I/O controller interface transactions. If XATS is asserted, the access is to I/O 
space and the following extensions to the memory access protocol apply: 


¢ Anew set of bus operations are defined. The transfer type, transfer burst, and 
transfer size signals are redefined for I/O controller interface operations; they 
convey the opcode for the I/O transaction (see Table 9-7). 


¢ There are two beats of address for each I/O controller interface transfer. The first 
beat (packet () provides basic address information such as the segment register and 
the sender tag; the second beat (packet 1) provides additional addressing bits and 
several control bits from the segment register. 


¢ Explicit sender/receiver tags are provided. 


¢ The sender that initiated the transaction must wait for a reply from the receiver bus- 
unit controller (BUC) before starting a new operation. 


* The MPC601 does not burst I/O controller interface transactions, but streaming is 
permitted. Streaming (in this context) allows multiple single-beat transactions to 
occur before a reply from the I/O receiver is required. 


I/O controller interface transactions use separate arbitration for the split address and data 
buses and define address-only and single-beat transactions. The address-retry vehicle is 
identical, although there is no hardware coherency support for I/O controller interface 
transactions. ARTRY is useful, however, for pacing MPC601 transactions, effectively 
indicating to the MPC601 that the BUC is in a queue-full condition and cannot accept new 
data. 


In addition to the extensions noted above, there are fundamental differences between 
memory and I/O controller interface operations. For example, use of DRTRY is undefined 
for MPC601 I/O controller interface operations. Additionally, only half of the 64-bit data 
path is available for MPC601 I/O controller interface transactions. This lowers the pin- 
count for I/O interfaces but generally results in substantially less bandwidth than memory 
accesses. Additionally, load/store instructions that address I/O controller interface 
segments cannot complete successfully without an error-free reply from the addressed 
BUC. Because normal I/O controller interface accesses involve multiple I/O transactions 
(streaming), they are likely to be very long latency instructions; therefore, I/O controller 
interface operations usually stall MPC601 instruction issue. 


Figure 9-23 shows an I/O controller interface tenure. Note that the I/O response is an 
address-only bus transaction. 


The decision on whether to map I/O peripherals into memory or I/O controller interface 
space depends on many factors; however, it should be noted that in the best case, the use of 
the MPC6()1 I/O controller interface protocol degrades performance and requires the 
addressed controllers to implement MPC601 bus master capability to generate the reply 
transactions. 
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Figure 9-23. I/O Controller Interface Tenures 


9.6.1 I/O Controller Interface Transactions 


Seven I/O controller interface transaction operations are defined by the MPC601, as shown 
in Table 9-7. These operations permit communication between the MPC601 and BUCs. A 
single MPC6()1 store or load instruction (that translates to an I/O controller interface 
access) generates one or more I/O controller interface operations (two or more I/O 
controller interface operations for loads) from the MPC601 and one reply operation from 
the addressed BUC. 


Table 9-7. I/O Controller Interface Bus Operations 


Operation Address Only | Direction =| 


Load start (request) 


10 = MPC601 


Load reply 
Store reply 
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For the first beat of the address bus, the extended address transfer code (XATC), contains 
the I/O opcode as shown in Table 9-8; the opcode is formed by concatenating the transfer 
type, transfer burst, and transfer size signals defined as follows: 


XATC = TT(0-3)IITBSTITSIZ(O-2) 
Table 9-8. I/O Controller Interface Bus Operations (XATC Encodings) 


Operation . XATC 


Load request 0100 0000 


Load immediate 0101 0000 
Load last 0111 0000 
Store immediate 0001 0000 
Store last 0011 0000 
Load reply 1100 0000 
Store reply 1000 0000 





9.6.1.1 Store Operations 
There are three operations defined for I/O controller interface store operations from the 
MPC601 to the BUC, defined as follows: 

1. Store immediate operations transfer up to 32 bits of data 

2. Store last operations transfer up to 32 bits of data each from the MPC601 to the BUC 


3. Store reply from the BUC reveals the success/failure of that I/O controller interface 
access to the MPC601. 


An I/O controller interface store access consists of one or more data transfer operations 
followed by the I/O store reply operation from the BUC. If the data can be transferred in 
one 32-bit data transaction, it is marked as a store last operation followed by the store reply 
operation; no store immediate operation is involved in the transfer, as shown in the 
following sequence: 


STORE LAST (from MPC601) 


STORE REPLY (from BUC) 
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However, if more data is involved in the I/O controller interface access, there will be one 
or more store immediate operations. The BUC can detect when the last data is being 
transferred by looking for the store last opcode, as shown in the following sequence: 
STORE IMMEDIATE(s) 
STORE LAST 


STORE REPLY 


9.6.1.2 Load Operations 

I/O controller interface load accesses are similar to store operations, except that the 
MPC601 latches data from the addressed BUC rather than supplying the data to the BUC. 
As with memory accesses, the MPC6()1 is the master on both load and store operations; the 
external system must provide the data bus grant to the MPC601 when the BUC is ready to 
supply the data to the MPC601. 


The load request I/O controller interface operation has no analogous store operation; it 
informs the addressed BUC of the total number of bytes of data that the BUC must provide 
to the MPC601 on the subsequent load immediate/load last operations. For I/O controller 
interface load accesses, the simplest, 32-bit (or fewer) data transfer sequence is as follows: 


LOAD REQUEST 
LOAD LAST 


LOAD REPLY (from BUC) 
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However, if more data is involved in the I/O controller interface access, there will be one 
or more load immediate operations. The BUC can detect when the last data is being 
transferred by looking for the load last opcode, as seen in the following sequence: 
LOAD REQUEST 
LOAD IMM(s) 
LOAD LAST 


LOAD REPLY 


Note that three of the seven defined operations are address-only transactions and do not use 
the data bus. However, unlike the memory transfer protocol, these transactions are not 
broadcast from one master to all snooping devices; The I/O controller interface address- 
only transaction protocol strictly controls communication between the MPC601 and the 
BUC. 


9.6.2 1/O Controller Interface Transaction Protocol Details 


As mentioned previously, there are two address-bus beats corresponding to two packets of 
information about the address. The two packets contain the sender and receiver tags, the 
address and extended address bits, and extra control and status bits. The two beats of the 
address bus (plus attributes) are shown at the top of Figure 9-24 as two packets. The first 
packet, packet (), is then expanded to depict the XATC and address bus information in 
detail. 


9.6.2.1 Packet 0 


Figure 9-24 shows the organization of the first packet in an I/O controller interface 
transaction. 


The XATC contains the I/O opcode, as discussed earlier and as shown in Table 9-8. The 
address bus contains the following: 


Key bit Il segment register Il sender tag 
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Figure 9-24. I/O Controller Interface Operation—Packet 0 


The XATC contains the I/O opcode, as discussed earlier and as shown in Table 9-8. The 
address bus contains the following: 


Key bit Il segment register || sender tag 


This information is organized as follows: 


¢ Bits 0 and 1 of the address bus are reserved—the MPC601 always drives these bits 
to zero. 


¢ Key bit—Bit 2 is the key bit from the segment register (either SR[Ku] or SR[Ks]). 
Ku indicates user-level access and Ks indicate supervisor-level access. The MPC601 
multiplexes the correct key bit into this position according to the current operating 
context (user or supervisor). 


¢ Segment register—Address bits 3-27 correspond to bits 3—27 of the segment 
register. Note that address bits 3-11 form the nine-bit receiver tag. Software must 
initialize these bits in the segment register to the ID of the BUC to be addressed; they 
are referred to as the BUID (bus unit ID) bits. 

¢ PID (sender tag)—Address bits 28-31 form the four-bit sender tag. These bits come 
from bits 28-31 of the MPC601 PID (processor ID) register. A four-bit tag allows a 
maximum of 16 processor IDs to be defined for a given system. If more bits are 
needed for a very large multiprocessor system, for example, it is envisioned that the 
second-level cache (or equivalent logic) can append a larger processor tag as 
needed. The BUC addressed by the receiver tag should latch the sender address 
required by the subsequent I/O reply operation. 


9.6.2.2 Packet 1 


The second address beat, packet 1, transfers byte counts and the physical address for the 
transaction, as shown in Figure 9-25. 
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Figure 9-25. I/O Controller Interface Operation—Packet 1 


For packet 1, the XATC is defined as follows: 


¢ Load request operations—XATC contains the total number of bytes to be transferred 
(128 bytes maximum for MPC60)1) 


¢ Immediate/last (load or store) operations—XATC contains the current transfer byte 
count (one to four bytes.) 


Address bits 0-31 contain the physical address of the transaction. The physical address is 
generated by concatenating segment register bits 28-31 with bits 4-31 of the effective 
address, as follows: 


Segment register (bits 28-31) ll effective address (bits 4-31) 


While the MPC601 provides the address of the transaction to the BUC, the BUC must 
maintain a valid address pointer for the reply. 


9.6.3 I/O Reply Operations 


BUCs must respond to MPC601 I/O controller interface transactions with an I/O reply 
operation. The purpose of this reply operation is to inform the MPC6()1 of the success or 
failure of the attempted I/O controller interface access. This requires the system I/O 
controller interface to have MPC601 bus mastership capability—a substantially more 
complex design task than bus slave implementations that use memory-mapped I/O access. 


Reply operations from the BUC to the MPC6()] are address-only transactions. As with 
packet 0 of the address bus on MPC6(0)1 I/O controller interface operations, the XATC 
contains the opcode for the operation (see Table 9-8). Additionally, the I/O reply operation 
transfers the sender/receiver tags in the first beat. 
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Figure 9-26. I/O Reply Operation 
The address bits are described in Table 9-9. 


Table 9-9. Address Bits for I/O Reply Operations 


Reserved. These bits should be set to zero for compatibility with future PowerPC microprocessors. 
Error bit. It is set if the BUC records an error in the access. 


ra BUID. Sender tag of a reply operation. Corresponds with bits 3-11 of one of the MPC601 segment 
registers. 


12-27 Address bits 12-27 are BUC-specific and are ignored by the MPC601. 


28-31 PID (receiver tag). The MPC601 effectively snoops operations on the bus and, on reply operations, 
compares this field to bits 28-31 of the PID register to determine if it should recognize this I/O reply. 










The second beat of the address bus is reserved; the XATC and address buses should be 
driven to zero to preserve compatibility with future protocol enhancements. 


The following sequence occurs when the MPC6()1 detects an error bit set on an I/O reply 
operation: 


1. The MPC601 completes the instruction that initiated the access. 
2. If the instruction is a load, the data is forwarded onto the register file(s)/sequencer. 


3. An I/O controller interface error exception is generated, which transfers MPC6(0)1 
control to the I/O controller interface error exception handler to recover from the 
error. Refer to Section 5.4.10, “I/O Controller Interface Error Exception 
(x’Q0A00’),” for more information. 


If the error bit is not set, the MPC601 instruction that initiated the access completes and 
instruction execution resumes. 
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System designers should note the following: 


¢ “Misplaced” reply operations (that match the processor tag and arrive unexpectedly) 
cause a checkstop condition. Refer to Chapter 5, “Exceptions,” for more 
information. 


¢ External logic must assert AACK for the MPC60)1, even though it is the receiver of 
the reply operation. AACK is an input-only to the MPC601. 


* The MPC601 monitors address parity when enabled by software and XATS and 
reply operations (load or store). 


9.6.4 I/O Controller Interface Operation Timing 


The following timing diagrams show the sequence of events in a typical MPC60)1 I/O 
controller interface load access (Figure 9-27) and a typical MPC601 I/O controller 
interface store access (Figure 9-28). All arbitration signals except for ABB and DBB have 
been omitted for clarity. Note that for either case, the number of immediate operations 
depends on the amount and the alignment of data to be transferred. If no more than four 
bytes are being transferred, and the data is double-word aligned (that is, does not straddle 
an eight-byte address boundary), there will be no immediate operation as shown in the 
figures. 


The MPC6()1 can transfer as many as 128 bytes of data in one load or store instruction 
(requiring more than 33 immediate operations in the case of misaligned operands). 


In Figure 9-27, XATS is asserted with the same timing relationship as TS in a memory 
access. Notice, however, that the address bus (and XATC) transition on the next bus clock 
cycle. The first of the two beats on the address bus is valid for one bus clock cycle window 
only, and that window is defined by the assertion of XATS. The second address bus beat, 
however, can be extended by delaying the assertion of AACK until the system has latched 
the address. 


The load request and load reply operations shown in Figure 9-27, are address-only 
transactions, as denoted by the negated TT3 signal during their respective address tenures. 
Note that other types of bus operations can occur between the individual I/O controller 
interface operations on the bus. The MPC601 involved in this transaction, however, does 
not initiate any other transactions once the first I/O controller interface operation has begun 
address tenure except for cache-sector snoop push-out operations resulting from snoop hits. 


Notice that, in this example (zero wait states), 13 bus clock cycles are required to transfer 
no more than eight bytes of data. 
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Figure 9-27. I/O Controller Interface Load Access Example 


Figure 9-28 shows an I/O store access, comprised of three I/O controller interface 
operations in this example. As with the example in Figure 9-27, notice that data is 
transferred only on the 32 bits of the DH bus. As opposed to Figure 9-27, there is no request 
operation since the MPC601 has the data ready for the BUC. 


The TEA signal may be asserted on any I/O controller interface operation. If it is asserted, 
the processor enters a checkstop condition if MSR[ME] is cleared, or it will queue a 
machine check exception if ME is set. After TEA is asserted, it must be reasserted for all 
tenures associated with the current I/O controller interface operation until the load last or 
store last operation occurs. When the operation occurs, the execution unit is released to take 
the machine check exception. If the TEA signal is asserted for an I/O controller interface 
operation, the reply operations (store reply or load reply) must not occur. If it does, it causes 
a checkstop condition. If the TEA signal is not asserted with a given I/O controller interface 
operation, the result of the assertion of TEA are unpredictable. The MPC6()1 may take a 
machine check exception or cause as checkstop condition. 
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Figure 9-28. I/O Controller Interface Store Access Example 


9.7 Interrupt, Checkstop, and Reset Signals 


This section describes external interrupts, checkstop operations, and hard and soft reset 
inputs. 


9.7.1 External Interrupt 


The maskable interrupt input (INT) to the MPC6()1 eventually forces the processor to take 
the external interrupt vector if the MSR(EE) bit is set. See Chapter 5, “Exceptions,” for 
more information about interrupts and exceptions. 


9.7.2 Checkstops 


The MPC601 has two checkstop signals, an input (CKSTP_IN) and an output 
(CKSTP_OUT). If CKSTP_IN is asserted, the MPC601 halts operations by gating off all 
internal clocks. The MPC601 does not assert CKSTP_OUT if CKSTP_IN if asserted. 


If CKSTP_OUT is asserted, the MPC60)1 has checkstopped internally. The CKSTP_OUT 
signal can be asserted for various reasons including receiving a TEA signal, as the result of 
an instruction dispatch, or internal and external parity errors. For more information on 
checkstop state, refer to Section 5.4.2.2, “Checkstop State (MSR[ME] =0).” 


Note that checkstop conditions can be disabled by setting bits in the HIDO register. For 
information, see Section 2.3.3.12.1, “Checkstop Sources and Enables Register—HID0.” 
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9.7.3 Reset Inputs 
The MPC601 has two reset inputs, described as follows: 


*HRESET (hard reset)—The HRESET signal is used for power-on reset sequences, or 
for situations in which the MPC601 must go through the entire cold-start sequence 
of self-tests. Once asserted, this input must be held asserted for a minimum of 300 
processor clock cycles to ensure that the processor has had enough time to recognize 
the input and initialize registers. This information is provided in Appendix D, 
“Reset.” 


¢ SRESET (soft reset)—The soft reset input provides warm reset capability. This 
input can be used to avoid forcing the MPC601 to complete the cold start sequence. 
This can be useful to recover from such conditions as check stop or some machine- 
check states that cannot be restarted. 


When either reset input is negated and if the self-test sequence completes without error, the 
processor attempts to fetch code from the system reset exception vector. The vector is 
located at offset x'00100' from the exception prefix (all zeros or ones, depending on the 
setting of the exception prefix bit in the machine state register (MSR[EP]). The EP bit is set 
for HRESET. 


9.7.4 Soft Stop Control Signals 


The soft stop control signals allow the processor to stop the clocks and bring the activity to 
a quiescent state in an orderly fashion (as opposed to a hard stop, which simply halts the 
clocks without regard to system activity). 


The soft stop state is entered by asserting the QUIESC_REQ signal. This signal allows the 
system to complete any bus activities that might be affected by stopping the clocks. When 
the system is ready to enter the soft stop state, it asserts the SYS_QUITESC signal. At this 
time the MPC601 takes a soft stop. 


During a soft stop all internal clocking is disabled after the system activity quiesces in an 
orderly manner, that is, there are no partially finished instructions. Soft stop is typically 
used for debugging; during the soft stop, the state bits in the chip can be scanned, examined 
and scanned back in. The processor returns to normal operation when the RESUME signal 
is asserted. 


9.8 Processor State Signals 


This section describes the MPC6()I's support for atomic update and storage through the use 
of the Iwarx/stwex. opcode pair and the configuration options for the MPC601 output 
buffer. 


9.8.1 Support for the lwarx/stwex. Instruction Pair 


The Load Word and Reserve Indexed (Iwarx) and the Store Word Conditional Indexed 
(stwex.)instructions provide a means for atomic memory updating. Memory can be updated 
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atomically by setting a reservation on the load and checking that the reservation is still valid 
before the store is performed. In the MPC601, the reservations are made on behalf of 
aligned, 32-byte sections of the memory address space. 


The reservation (RSRV) output signal is always driven by bus clock cycle and reflects the 
status of the reservation coherency bit in the reservation address register (see Chapter 4, 
“Cache and Memory Unit Operation,” for more information). See Section 8.2.9.8, 
“Reservation (RSRV)—Output,” for information about timing. 


9.9 IEEE 1149.1-Compatible Interface 


The MPC601 provides a boundary-scan interface compatible with IEEE 1149.1-compliant 
parts. Although the standard allows built-in self-test (BIST), the MPC6()1 interface 
supports only boundary scan. This section briefly describes the MPC601 interface and its 
differences with the IEEE 1149.1 interface. 


9.9.1 Deviations from the IEEE 1149.1 Boundary-Scan Specifications 
The MPC601 deviates from the IEEE 1149.1 specifications in the following ways: 


¢ Inthe IEEE 1149.1 specifications, no mode pin is required to use the IEEE 1149.1 
boundary-scan interface. However, in the MPC6()1, the scan enable mode input 
) signal must be asserted to run boundary-scan testing. The signal must 
be pulled up when boundary-scan testing is not being performed. 


¢ Whereas the IEEE 1149.1 specifications indicate that only TCK should be used to 
clock data-register latches, in the MPC601 the processor system clock must be 
active (oscillating) during testing. 

* The MPC601 implements only the PRELOAD portion of the SAMPLE/PRELOAD 
function. 


¢ IEEE 1149.1 specifies that data on the primary output should be held valid while the 
processor is in the SHIFT DT state and that data should change only in the UPDATE 
DT or UPDATE IT states (assuming the instruction is valid). In the MPC601, no 
stable values are held on primary outputs for the SHIFT DT state. The SHIFT DT 
state forces primary outputs to high impedance. Outputs are enabled if the IT 
contains a valid instruction and the TAP is in the UPDATE DT or UPDATE IT state. 


¢ JEEE 1149.1 specifies that asserting the TRST signal should reset only the TAP. In 
the MPC601, the TRST signal resets the TAP, system logic, and the COP. 


IEEE 1149.1 also specifies the use of the TRST signal to disable TAP. On the 
MPC601, this can be done by negating the BSCAN_EN signal, which prohibits 
resetting the TAP and system logic independently. The TRST signal should not be 
used to disable the TAP in the system functional environment; the BSCAN_EN 
signal should be used. The user can use the TRST signal as described above or hold 
TMS high for tive TCK cycles. Note that not all SRLs in the MPC601 are boundary- 
scan SRLs. The boundary-scan chain includes functional system SRLs. 
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9.9.2 Additional Information about the IEEE 1149.1 Interface 
Note the following points concerning the IEEE 1149.1 interface: 


¢ Because the driver inhibit to all COMMON IO signals is controlled by a common 
signal, all COMMON INPUT/OUTPUT devices must inbound off-chip data or 
outbound on-chip data. 


¢ Not all SRLs in the boundary-scan chain are boundary-scan SRLs. 


9.10 Using DBWO (Data Bus Write Only) 


The MPC601 supports split transaction pipelined transactions. Additionally, the DBWO 
signal allows the MPC601 to be configured dynamically to source write data out of order 
with respect to read data. 


In general, an address tenure on the bus is followed strictly in order by its associated data 
tenure. Transactions pipelined by a single MPC6(0)1 complete strictly in order. However, the 
MPC6()1 can run bus transactions out of order only when the external system allows the 
MPC6()1 to perform a cache-sector snoop push-out operation (or other write transaction, if 
pending in the MPC601 write queues) between the address and data tenures of a read 
operation through the use of DBWO. This effectively envelopes the write operation within 
the read operation. This can be useful in some external queued controller scenarios or for 
more complex memory implementations that can support so-called dump-and-run 
operations. These include the cache sector cast out of a modified sector caused by a load 
miss. A replacement copyback operation can be written to memory buffers while the 
memory location is being accessed for the line fill. The sector is written (dumped) into 
memory buffers while the memory is accessed for the load operation. Optimally, the 
replacement copy-back operation can be absorbed by the memory system without affecting 
load memory latency. Figure 9-11 gives an example of the use of the DBWO input. 


Figure 9-29 illustrates the following sequence of operations: 


1. Processor A begins a read operation. (Bus clock cycle 2) 


2. Processor B attempts a global read but is interrupted by a retry from processor A 
(bus clock cycle 7) 


3. Processor A performs a cache-sector snoop push-out operation out of order because 
of the assertion of DBWO (bus clock cycle 8) 


4. Processor B successfully performs the global read (bus clock cycle 13) 
5. Processor A successfully concludes its original read operation (bus clock cycle 16) 
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Figure 9-29. Data Bus Write-Only Transaction 


Note that although the MPC6()1 can pipeline any write transaction behind the read 
transaction, special care should be used when using the enveloped write feature. It is 
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envisioned that most system implementations will not need this capability; for these 
applications DBWO should remain negated. In systems where this capability is needed, 
DBWO should be asserted under the following scenario: 


1. The MPC601 initiates a read transaction (either single-beat or burst) by completing 
the read address tenure with no address retry. 


2. Then, the MPC6(01 initiates a write transaction by completing the write address 
tenure, with no address retry. 


3. At this point, if DBWO is asserted with a qualified data bus grant to the MPC601, 
the MPC601 asserts DBB and drives the write data onto the data bus, out of order 


with respect to the address pipeline. The write transaction concludes with the 
MPC601 negating DBB. 


4, The next qualified data bus grant signals the MPC601 to complete the outstanding 
read transaction by latching the data on the bus. This assertion of DBG should not 
be accompanied by an asserted DBWO. 

Any number of bus transactions by other bus masters can be attempted between any of 
these steps. 


Note the following regarding DBWO: 
¢ DBWO cannot be asserted if no data bus write tenures are pending. 


¢ DBWO can be asserted if no data bus read is pending, but it has no effect on write 
ordering. 


¢ The ordering and presence of data bus writes is determined by the writes in the write 
queues at the time BG is asserted for the write address (not DBG). If a particular 
write is desired (for example, a cache-sector snoop push-out operation), then BG 
must be asserted after that particular write is in the queue and it must be the highest 
priority write in the queue at that time. A cache-sector snoop push-out operations 
may be the highest priority write, but more than one may be queued. 


¢ Because more than one write may be in the write queue when DBG is asserted for 
the write address, more than one data bus write may be enveloped by a pending data 
bus read. 


The arbiter must monitor bus operations and coordinate the various masters and slaves with 
respect to the use of the data bus when DBWO is used. Individual DBG signals associated 
with each bus device should allow the arbiter to synchronize both pipelined and split- 
transaction bus organizations. Individual DBG signals provide a primitive form of source- 
level tagging for the granting of the data bus. 


Note that use of the DBWO signal allows some operation-level tagging with respect to the 
MPC601 and the use of the data bus. 
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Chapter 10 
Instruction Set 


This chapter describes individual instructions, including a description of instruction 
formats and notation and an alphabetical listing of the MPC601’s instructions by 
mnemonic. 


10.1 Instruction Formats 


Instructions are four-bytes long and word-aligned, so when instruction addresses are 
presented to the processor (as in branch instructions) the two low-order bits are ignored. 
Similarly, whenever the processor develops an instruction address, its two low-order bits 
are Zero. 


Bits 0-5 always specify the primary opcode. Many instructions also have an secondary 
opcode. The remaining bits of the instruction contain one or more fields for the different 
instruction formats. 


Some instruction fields are reserved or must contain a predefined value as shown in the 
individual instruction layouts. If a reserved field does not have all bits set to 0, or if a field 
that must contain a particular value does not contain that value, the instruction form is 
invalid and the results are as described in Appendix D, “Classes of Instructions”. 


10.1.1 Split Field Notation 


Some instruction fields occupy more than one contiguous sequence of bits or occupy a 
contiguous sequence of bits used in permuted order. Such a field is called a split field. In 
the format diagrams and in the individual instruction layouts, the name of a split field is 
shown in small letters, once for each of the contiguous sequences. In the pseudocode 
description of an instruction having a split field and in some places where individual bits of 
a split field are identified, the name of the field in small letters represents the concatenation 
of the sequences from left to right. Otherwise, the name of the field is capitalized and 
represents the concatenation of the sequences in some order, which need not be left to right, 
as described for each affected instruction. 
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10.1.2 Instruction Fields 
Table 10-1 describes the instruction fields used in the various instruction formats. 


Table 10-1. Instruction Formats 


Absolute address bit 

0 The immediate field represents an address relative to the current instruction address. The 
effective address of the branch is either the sum of the LI field sign-extended to 32 bits and the 
address of the branch instruction or the sum of the BD field sign-extended to 32 bits and the 
address of the branch instruction. 

1 The immediate field represents an absolute address. The effective address of the branch is 
the LI field sign-extended to 32 bits or the BD field sign-extended to 32 bits. 


crbA (11-15) Field used to specify a bit in the CR to be used as a source. 
crbB (16-20) Field used to specify a bit in the CR to be used as a source. 


BD (16-29) Immediate field specifying a 14-bit signed two's complement branch displacement that is 
concatenated on the right with b’00’ and sign-extended to 32 bits. 


crfD (6-8) Field used to specify one of the CR fields or one of the FPSCR fields as a destination. 
crfS (11-13) Field used to specify one of the CR fields or one of the FPSCR fields as a source. 


BI (11-15) Field used to specify a bit in the CR to be used as the condition of a branch conditional 
instruction. 


BO (6-10) Field used to specify options for the branch conditional instructions. The encoding is described in 
Section 3.7.1, “Branch Instructions”. 


crbD (6-10) Field used to specify a bit in the CR or in the FPSCR as the destination of the result of an 
instruction. 


d(16-31) Immediate field specifying a 16-bit signed two's complement integer that is sign-extended to 32 
bits. 


LI (6-29) Immediate field specifying a 24-bit, signed two's complement integer that is concatenated on the 
right with b'00’ and sign-extended to 32 bits. 


LK (31) Link bit. 
0 Does not update the link register. 
1 Updates the link register. If the instruction is a branch instruction, the address of the instruction 
following the branch instruction is placed into the link register. 


MB (21-25) and Fields used in rotate instructions to specify a 32-bit mask consisting of 1-bits fram bit MB+32 
ME (26-30) through bit ME+32 inclusive, and 0-bits elsewhere, as described in Section 3.3.4, “Integer Rotate 
and Shift Instructions”. 
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Table 10-1. Instruction Formats (Continued) 


Primary opcode field. 


Re (31) Record bit 
0 Does not update the condition register. 
1 Updates the condition register (CR) to reflect the result of the operation. 
For integer instructions, CR bits 0-3 are set to reflect the result as a signed quantity. The 
result as an unsigned quantity or a bit string can be deduced from the EQ bit. For floating-point 
instructions, CR bits 4-7 are set to reflect floating-point exception, floating-point enabled 
exception, floating-point invalid operation exception, and floating-point overflow exception. 


Field used to specify a special purpose register for the mtspr and mfspr instructions. The 
encoding is described in Section 3.8.1, “Move To/From Special Purpose Register Instructions”. 

Field used to specify the conditions on which to trap. The encoding is described in Section 3.7.5, 
“Trap Mnemonics”. 


XO (21-30, 22- Secondary opcode field. 
30, 26-30, or 30) 


10.1.3 Notation and Conventions 


The operation of some instructions is described by a semiformal language (pseudocode). 
See Table 10-2 for a list of pseudocode notation and conventions used throughout this 
chapter. 





























Table 10-2. Pseudocode Notation and Conventions 


: 
* 
+ 











Multiplication 
Division (yielding quotient) 
ie | Two’s-complement addition 
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Table 10-2. Pseudocode Notation and Conventions (Continued) 


Notation/Convention 
P- Two’s-complement subtraction, unary minus 
Equals and Not Equals relations 
Signed comparison relations 

? 
( 







































<U,>U Unsigned comparison relations 


Unordered comparison relation 


Used to describe the concatenation of two values (i.e., 010 || 111 is the same as 010111) 


cy AND, OR logical operators 


Exclusive-OR, Equivalence logical operators ((a=b) 


(a®-b)) 


x ‘nnnn Anumber expressed in hexadecimal format 


Anumber expressed in binary format 


2 

The contents of rA if the rA field has the value 1-31, or the value 0 if the rA field is 0 

As the last character of an instruction mnemonic, a period (.) means that the instruction 

updates the Condition Register field. 

Least integer > x 

Result of converting x form floating-point single format to floating-point double format. 
(x) 
) 








Result of extending x on the left with sign bits 

General Purpose Register x 

Mask having 1's in positions x through y (wrapping if x > y) and 0’s elsewhere 
Contents of y bytes of memory starting at address x 

Result of rotating the 64-bit value x||x left y positions, where x is 32 bits long 

Result of converting x from floating-point double format to floating-point single format. 
Special Purpose Register x 

x is raised to the nth power 


The replication of x, n times (i.e., x concatenated to itself n-1 times). (n)O and (n)1 are 


special cases 


nis a bit or field within x, where x is a register 
TRAP Invoke the system trap handler 


undefined An undefined value. The value may vary from one implementation to another, and from 
one execution to another on the same implementation. 


Reference to the setting of status bits, in a standard way that is explained in the text 


Current Instruction Address, which is the 32-bit address of the instruction being 








described by a sequence of pseudocode. Used by relative branches to set the Next 
Instruction Address (NIA). Does not correspond to any architected register. 
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Table 10-2. Pseudocode Notation and Conventions (Continued) 


NI Next Instruction Address, which is the 32-bit address of the next instruction to be 
executed (the branch destination) after a successful branch. In pseudocode, a 
successful branch is indicated by assigning a value to NIA. For instructions which do not 
branch, the next instruction address is CIA +4. 


if...then...else... Conditional execution, indenting shows range, else is optional 


Do loop, indenting shows range. “To” and/or “by” clauses specify incrementing an 
iteration variable, and “while’ and/or “until” clauses give termination conditions, in the 
usual manner. 

Leave innermost do loop, or do loop described in leave statement 


Precedence rules for pseudocode operators are summarized in Table 10-3. 





Table 10-3. Precedence Rules 


Associativity 
Right to lett 
x(n) or exponentiation 

Mili ante tented Left to right 














=,4,<,S,>,2,<U,>U,? Left to right 





Note that operators higher in Table 10-3 are applied before those lower in the table. 
Operators at the same level in the table associate from left to right, from right to left, or not 
at all, as shown. 


10.2 MPC601 Instruction Set 


The remainder of this chapter lists and describes the instruction set for the MPC60)1. The 
instructions are listed in alphabetical order by mnemonic and include those instructions that 
are specific to the MPC601 that are not specified as part of the PowerPC architecture. 
Figure 10-1 shows the format for each instruction description page. 
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addx 


Integer Unit 


Instruction Name 










Instruction Syntax add rD,rAvrB (OE=0 Re=0) 
add. rD,rA vB (OE=0 Rc=1) 
addo rDrArB (OE=1 Re=0) 
addo. rD,rArB (OE=1 Rce=1) 





Equivalent Power Mnemonics [POWER mnemonics: cax, cax., caxo, caxo.] 














Instruction Encoding 


0 56 £41011 1516 20 2122 30 31 






Pseudocode Description of 
Instruction Operation 


Text Description of 
Instruction Operation 


Registers Altered by Instruction 


rl & (rA) + (rB) 





The sum (rA) + (rB) is placed into rD. 


Other registers altered: 


¢ Condition Register (CRO Field): 







Affected: LT, GT, EQ, SO (if Re=1) 
« XER: 


Affected: SO, OV 





(if OE=1) 


Figure 10-1. Instruction Description 


Note in Figure 10-1 that the execution unit that executes the instruction may not be the 
same for other PowerPC processors. 
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absx POWER Architecture Instruction absx 


Absolute Integer Unit 
abs rD,rA (OE=0 Rc=0) 
abs. rD,rA (OE=0 Rc=1) 
abso rD,rA (OE=1 Rc=0) 
abso. rD,rA (OE=1 Rc=1) 
Reserved 






0 5 6 10 11 1516 202122 30 31 


This instruction is not part of the PowerPC architecture. 


The absolute value I(rA)l is placed into rD. If rA contains the most negative number (i.e., 
x’8000 0000”), the result of the instruction is the most negative number and sets XER[OV] 
if overflow signaling is enabled. 
Other registers altered: 

* Condition Register (CRO Field): 


Affected: LT, GT, EQ, SO (if Rc=1) 
¢ XER: 
Affected: SO, OV (if OE=1) 


Note: This instruction is specific to the MPC601. 
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addx | addx 


Add Integer Unit 
add rD,rA,rB (OE=0 Rc=0) 
add. rD,rA,rB (OE=0 Rc=1) 
addo rD,rA,rB (OE=1 Rc=0) 
addo. rD,rA,rB (OE=1 Rc=1) 


[POWER mnemonics: cax, cax., cax0, cax0.] 


ee a 


10 11 20 21 22 30 31 
rD < (rA) + (rB) 
The sum (rA) + (rB) is placed into rD. 


Other registers altered: 
* Condition Register (CRO Field): 


Affected: LT, GT, EQ, SO (if Rc=1) 
¢ XER: 
Affected: SO, OV (if OE=1) 
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addcx addcx 


Add Carrying Integer Unit 
addc rD,rA,rB (OE=0 Rc=0) 
addc. rD,rA,rB (OE=0 Rc=1) 
addco rD,rA,rB (OE=1 Rc=0) 
addco. rD,rA,rB (OE=1 Rc=1) 


[POWER mnemonics: a, a., a0, a0.] 


ee 


10 11 15 16 20 21 22 30 31 
rD <— (rA) + (rB) 
The sum (rA) + (rB) is placed into rD. 


Other registers altered: 
¢ Condition Register (CRO Field): 


Affected: LT, GT, EQ, SO (if Rc=1) 
¢ XER: 

Affected: CA 

Affected: SO, OV (if OE=1) 
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addex addex 


Add Extended Integer Unit 
adde rD,rA,rB (OE=0 Rc=0) 
adde. rD,rA,rB (OE=0 Rc=1) 
addeo rD,rA,rB (OE=1 Rc=0) 
addeo. rD,rA,rB (OE=1 Rc=1) 


[POWER mnemonics: ae, ae., aed, aeo.] 


Fea ae eet Rae EE ee ee 


10 11 15 16 20 21 22 30 31 
rD < (rA) +(rB) + XER[CA] 
The sum (rA) + (rB) + XER[CA] is placed into rD. 


Other registers altered: 
¢ Condition Register (CRO Field): 


Affected: LT, GT, EQ, SO (if Rc=1) 
¢ XER: 

Affected: CA 

Affected: SO, OV (if OE=1) 
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addi 


Add Immediate 


addi rD,rA,SIMM 


[POWER mnemonic: cal] 


addi 


Integer Unit 


14 
0 5 6 


10 11 15 16 


if rA=0 then rD<-EXTS(SIMM) 
else rD<(rA)+EXTS(SIMM) 


The sum (rAl 0) + SIMM 


Other registers altered: 
« None 
Simplified mnemonics: 


subi rA,rB,value 


MOTOROLA 


is placed into rD. 


equivalent to addi rD,rA,-value 


Chapter 10. Instruction Set 


31 
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addic addic 


Add Immediate Carrying Integer Unit 


addic rD,rA,SIMM 
[POWER mnemonic: ai] 


[ie [To [| ] 
0 5 6 10 11 15 16 31 
rD < (rA) + EXTS(SIMM) 
The sum (rA) + SIMM is placed into rD. 
Other registers altered: 
* XER: 
Affected: CA 
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addic. addic. 


Add Immediate Carrying and Record Integer Unit 


addic. rD,rA,SIMM 
[POWER mnemonic: ai.] 


0 


5 6 10 11 15 16 31 


rD < (rA) + EXTS(SIMM) 
The sum (rA) + SIMM is placed into rD. 


Other registers altered: 
¢ Condition Register (CRO Field): 
Affected: LT, GT, EQ, SO 
* XER: 
Affected: CA 
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addis addis 


Add Immediate Shifted Integer Unit 
addis rD,rA,SIMM 


[POWER mnemonic: cau] 


0 


5 6 10 11 15 16 . 31 


if rA=0 then rD<-(SIMM Il (16)0) 

else rD<(rA)+(SIMM II (16)0) 
The sum (rAl 0) + (SIMM II x’0000’) is placed into rD. 
Other registers altered: 


e None 
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addmex addme«x 


Add to Minus One Extended Integer Unit 
addme rD,rA (OE=0 Rc=0) 
addme. rD,rA (OE=0 Rc=1) 
addmeo rD,rA (OE=1 Rce=0) 
addmeo. rD,rA (OE=1 Rc=1) 


[POWER mnemonics: ame, ame., ameo, ameo. ] 


Reserved 





0 5 6 10 11 15 16 20 21 22 30 31 


rD < (rA) + XER[CA] - 1 
The sum (rA)+XER[CA]+x ‘FFFFFFFF’ is placed into rD. 
Other registers altered: 
* Condition Register (CRO Field): 


Affected: LT, GT, EQ, SO (if Rc=1) 
e XER: 

Affected: CA 

Affected: SO, OV (if OE=1) 


MOTOROLA Chapter 10. Instruction Set 10-15 


addzex addzex 


Add to Zero Extended Integer Unit 
addze rD,rA (OE=0 Re=0) 
addze. rD,rA (OE=0 Rc=1) 
addzeo rD,rA (OE=1 Rc=0) 
addzeo. rD,rA (OE=1 Rc=1) 


[POWER mnemonics: aze, aze., azeo, azeo.| 


Reserved 





0 5 6 10 11 15 16 20 21 22 30 31 


rD < (rA) + XER[CA] 
The sum (rA)+XER[CA] is placed into rD. 


Other registers altered: 
¢ Condition Register (CRO Field): 


Affected: LT, GT, EQ, SO (if Rc=1) 
* XER: 
Affected: CA 
0 | Affected: SO, OV (if OB=1) 
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andx andx 
AND Integer Unit 


and rA,rsS,rB (Rc=0) 
and. rA,rS,rB (Rc=1) 


[= [Ts [ « [3 | * 
0 5 6 10 11 15 16 20 21 30 31 
rA <— (rS) & (rB) 
The contents of rS is ANDed with the contents of rB and the result is placed into rA. 
Other registers altered: 
¢ Condition Register (CRO Field): 
Affected: LT, GT, EQ, SO (if Rc=1) 
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andcx andcx 


AND with Complement Integer Unit 
andc rA,rS,rB (Re=(0) 
andc. rA,rS,rB (Rc=1) 
ee eee le eerste es ne 
0 5 6 10 11 15 16 20 21 30 31 
rAe(rS)+ -(rB) 


The contents of rS is ANDed with the one’s complement of the contents of rB and the result 
is placed into rA. 
Other registers altered: 
¢ Condition Register (CRO Field): 
Affected: LT, GT, EQ, SO (if Rc=1) 
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andi. andi. 


AND Immediate Integer Unit 


andi. rA,rS,;UIMM 
[POWER mnemonic: andil.] 


0 


5 6 10 11 15 16 31 
rA¢(rS) & (x’0000’I!UIMM) 
The contents of rS is ANDed with x’0000' || UIMM and the result is placed into rA. 
Other registers altered: 


¢ Condition Register (CRO Field): 
Affected: LT, GT, EQ, SO 
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andis. andis. 


AND Immediate Shifted Integer Unit 


andis. rA,rS,;UIMM 
[POWER mnemonic: andiu. ] 


SS 
0 5 6 10 11 15 16 31 
rA<(rS)+(UIMM Il x’0000’) 
The contents of rS is ANDed with UIMM II x0000’ and the result is placed into rA. 
Other registers altered: 
* Condition Register (CRO Field): 
Affected: LT, GT, EQ, SO 
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bx 


Branch 


b 
ba 
bl 
bla 


target_addr 
target_addr 
target_addr 
target_addr 


(AA=0 LK=0) 
(AA=1 LK=0) 
(AA=0 LK=1) 
(AA=1 LK=1) 


bx 


Branch Processing Unit 


0 5 


6 


if AA, then NIAC-EXTS(LI ll b’00’) 
else NIA€CIA+EXTS(LI Il b’00’) 


if LK, then 


LReCIAH4 


target_addr specifies the branch target address. 


29 30 31 


If AA=0, then the branch target address is the sum of LI Il b’00’ sign-extended and the 
address of this instruction. 


If AA=1, then the branch target address is the value LI II b’00’ sign-extended. 


If LK=1, then the effective address of the instruction following the branch instruction is 


placed into the link register. 


Other registers altered: 
Affected: Link Register (LR) (if LK=1) 


MOTOROLA 


Chapter 10. Instruction Set 
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bcx bcx 


Branch Conditional Branch Processing Unit 


be BO,BI,target_addr (AA=0 LK=0) 
bca BO,BI,target_addr (AA=1 LK=0) 
bel BO,BI,target_addr (AA=() LK=1) 
bela BO,BI,target_addr (AA=1 LK=1) 


a a a ee ee 
0 


5 6 10 11 15 16 29 30 31 


if ~BO[2], then CTR — CTR-1 
ctr_ok — BO[2]! ((CTRx0) ® BO[3)) 
cond_ok — BO[0] | (CRTBIT = BO[1]) 
if ctr_ok & cond_ok, then 
if AA, then NIA — EXTS(BD II b’00’) 
else NIA < CIA+EXTS(BD Il b’00’) 
if LK, then 
LR <— CIAH4 


The BI field specifies the bit in the Condition Register (CR) to be used as the condition of 
the branch. The BO field is used as described above. 


target_addr specifies the branch target address. 


If AA=0, the branch target address is the sum of BD II b’00’ sign-extended and the address 
of this instruction. 


If AA=1, the branch target address is the value BD II b’00’ sign-extended. 


If LK=1, the effective address of the instruction following the branch instruction is placed 
into the link register. 
Other registers altered: 

Affected: Count Register (CTR) Gif BO[2]=0) 

Affected: Link Register (LR) (if LK=1) 


Simplified mnemonics: 


bit target equivalent to be =: 12,0,target 
bne _cr2,target equivalent to be 4,10,target 
bdnz target equivalent to be _— 16,0, target 
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bcctrx bcctrx 


Branch Conditional to Count Register Branch Processing Unit 
bectr BO,BI (LK=0) 
bectrl BO,BI (LK=1) 


[POWER mnemonics: bec, becl] 


Reserved 





0 5 6 10 11 15 16 20 21 30 31 


cond_ok © BO[0] | (CR[BI] = BO[1!]) 
if cond_ok then 
NIA © CTR II b’00’ 
if LK then 
LR & CIAH4 
The BI field specifies the bit in the condition register to be used as the condition of the 
branch. The BO field is used as described above, and the branch target address is 
CTR[O—29] Il b’00”. 


If LK=1, the effective address of the instruction following the branch instruction is placed 
into the link register. 


If the “decrement and test CTR” option is specified (BO[2]=0), the instruction form is 
invalid. 


In the case of BO[2]=0 on the MPC601, the decremented count register is tested for zero 
and branches based on this test, but instruction fetching is directed to the address specified 
by the non-decremented version of the count register. The use of this invalid form of the 
bectrx instruction is not recommended. This description is provided for informational 
purposes only. 
Other registers altered: 

Affected: Link Register (LR) (if LK=1) 
Simplified mnemonics: 
bltctr equivalent to bectr 12,0 


bnect cr2 equivalent to bectr 4,10 
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bclirx 


Branch Processing Unit 


bclirx 


Branch Conditional to Link Register 


belr BO,BI 
belrl BO,BI 


[POWER mnemonics: ber, berl] 


(LK=0) 
(LK=1) 


Reserved 





0 5 6 10 11 15 16 20 21 30 31 


if —BO[2] then CTR — CTR-1 
ctr_ok — BO[2] 1 ((CTR+0) © BO[3]) 
cond_ok — BQ[0]! (CR[BI] = BO[1]) 
if ctr_ok & cond_ok then 

NIA < LR II b’00” 
if LK then 

LR <— CIAH 


The BI field specifies the bit in the condition register to be used as the condition of the 
branch. The BO field is used as described above, and the branch target address is 
LR[O-29] Il b’00’. 


If LK=1 then the effective address of the instruction following the branch instruction is 
placed into the link register. 
Other registers altered: 


Affected: Count Register (CTR) (if BO[2]=0) 


Affected: Link Register (LR) (if LK=1) 
Simplified mnemonics: 
bitir equivalent to belr 12,0 
bnelr cr2 equivalent to belr 4,10 
bdnzlr equivalent to belr 16,0 
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cles POWER Architecture Instruction cles 


Cache Line Compute Size Integer Unit 


cles rD,rA 


Reserved 





0 5 6 10 11 15 16 20 21 30 31 


This instruction is not part of the PowerPC architecture. 
This instruction places the cache line size specified by rA into rD, according to the 
following: 


The value placed in rD shall be 64 for valid values of rA. 


(rA) Line Size Returned in rD 





Other registers altered: 
* Condition Register (CRO Field): 
Affected: Undefined (if Rc=1) 
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cmp | cmp 





Compare Integer Unit 
cmp crfD,L,rA,rB 
| Reserved 
0 5 6 15 16 20 21 30 31 
a € (rA) 
b © (rB) 


if a<bthenc<b’100’ 

else if a> b then c — b’010’ 

else c¢+b’001’ 
CR[4*crfD:4*crfD+3] <— c Il XER[SO] 


The contents of rA is compared with the contents of rB, treating the operands as signed 
integers. The result of the comparison is placed into CR Field erfD. 


The L operand controls whether the instruction operands are treated as 64- or 32-bit 
operands, with L=0 indicating 32-bit operands and L=1 indicating 64-bit operands. The 
state of the L operand does not effect the operation of the MPC601. 
Other registers altered: 
* Condition Register (CR Field specified by operand crfD): 
Affected: LT, GT, EQ, SO 
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cmpi cmpi 


Compare Immediate Integer Unit 


cmpi crfD,L,rA,SIMM 


Reserved 





0 5 6 8 9 10 11 15 16 31 


a (rA) 

if a<EXTS(SIMM) then c — b’100’ 
else if a> EXTS(SIMM) then c — b’010’ 
else c <b’001’ 
CR[4*erfD:4*crfD+3] < c ll XER[SO] 


The contents of rA is compared with the sign-extended value of the SIMM field, treating 
the operands as signed integers. The result of the comparison is placed into CR Field crfD. 


The L operand controls whether the instruction operands are treated as 64- or 32-bit 
operands, with L=0 indicating 32-bit operands and L=1 indicating 64-bit operands. The 
state of the L operand does not effect the operation of the MPC6()1. 
Other registers altered: 
* Condition Register (CR Field specified by operand crfD): 
Affected: LT, GT, EQ, SO 
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cmpl cmpl 





Compare Logical Integer Unit 
cmpl crfD,L,rA,rB 
Reserved 
0 5 6 8 9 10 11 15 16 20 21 
a¢(rA) 
b€ (rB) 


if a<bthenc + b’100’ 

else if a> b then c © b’010’ 

else ce b’00r 
CR[4*crfD:4*crfD+3] <— c Ill XER[SO] 


The contents of rA is compared with the contents of rB, treating the operands as unsigned 
integers. The result of the comparison is placed into CR Field erfD. 


The L operand controls whether the instruction operands are treated as 64- or 32-bit 
operands, with L=0 indicating 32-bit operands and L=1 indicating 64-bit operands. The 
state of the L operand does not effect the operation of the MPC6()1. 
Other registers altered: 
* Condition Register (CR Field specified by operand erfD): 
Affected: LT, GT, EQ, SO 
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cmpili cmpli 


Compare Logical Immediate Integer Unit 


cmpli = erfD,L,rA,UIMM 


Reserved 





a€(rA) 

b< (rB) 

if = a<(x’0000’ ll UIMM) then c <— b’100’ 
else if a > (x’0000’ I| UIMM) then c — b’010’ 
else c€hb’001’ 
CR[4*erfD:4*crfD+3] <— c ll XER{SO] 


The contents of rA is compared with x’0000’ || UIMM, treating the operands as unsigned 
integers. The result of the comparison is placed into CR Field erfD. 


The L operand controls whether the instruction operands are treated as 64- or 32-bit 
operands, with L=() indicating 32-bit operands and L=1 indicating 64-bit operands. The 
state of the L operand does not effect the operation of the MPC6()1. 
Other registers altered: 
¢ Condition Register (CR Field specified by operand erfD): 
Affected: LT, GT, EQ, SO 
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cntlzwx cntizwx 


Count Leading Zeros Word Integer Unit 
cntlzw rA,rS (Rc=() 
cntizw. rA,rS (Rc=1) 


[POWER mnemonics: cntlz, cntlz.] 


Reserved 





ne-O0 

do while n< 32 
if rS[nJ=1 then leave 
ne ntl 

rAcn 


A count of the number of consecutive zero bits starting at bit 0 of rS is placed into rA. This 
number ranges from 0) to 32, inclusive. 
Other registers altered: 
¢ Condition Register (CRO Field): 
Affected: LT, GT, EQ, SO (if Rc=1) 
For count leading zeros instructions, if Rc=1 then LT is cleared to zero in the CRO field. 
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crand crand 


Condition Register AND Integer Unit 


crand = crbD,crbA,crbB 


Reserved 





0 5 6 10 11 15 16 20 21 30 31 


CR[cerbD] <— CR[crbA] & CR[crbB] 
The bit in the condition register specified by crbA is ANDed with the bit in the condition 
register specified by crbB. The result is placed into the condition register bit specified by 
crbD. 
Other registers altered: 
¢ Condition Register: 
Affected: Bit specified by operand crbD 
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crandc crandc 


Condition Register AND with Complement Integer Unit 


crandec = crbD,crbA,crbB 


Reserved 





0 5 6 10 11 15 16 20 21 30 31 


CR[erbD] — CR[{crbA] & 4CR[crbB] 
The bit in the condition register specified by erbA is ANDed with the complement of the 
bit in the condition register specified by crbB and the result is placed into the condition 
register bit specified by crbD. 
Other registers altered: 
* Condition Register: 
Affected: Bit specified by operand crbD 
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creqv creqv 


Condition Register Equivalent Integer Unit 


creqv crbD,crbA,crbB 


Reserved 





CR[crbD} <— CR{crbA] = CR[crbB] 
The bit in the condition register specified by crbA is XORed with the bit in the condition 
register specified by crbB and the complemented result is placed into the condition register 
bit specified by erbD. 
Other registers altered: 
¢ Condition Register: 
Affected: Bit specified by operand crbD 
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crnand crnand 


Condition Register NAND Integer Unit 


crnand = crbD,crbA,crbB 


Reserved 





CR[crbD] <— -(CR[crbA] & CR[crbB]) 
The bit in the condition register specified by erbA is ANDed with the bit in the condition 
register specified by crbB and the complemented result is placed into the condition register 
bit specified by crbD. 
Other registers altered: 
* Condition Register: 
Affected: Bit specified by operand crbD 
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crnor crnor 


Condition Register NOR Integer Unit 


crnor crbD,crbA,crbB 


Reserved 






0 5 6 10 11 15 16 20 21 30 31 


CR[crbD] — ~(CR[crbA] | CR[crbB]) 
The bit in the condition register specified by erbA is ORed with the bit in the condition 
register specified by crbB and the complemented result is placed into the condition register 
bit specified by crbD. 
Other registers altered: 
¢ Condition Register: 
Affected: Bit specified by operand crbD 
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cror cror 


Condition Register OR Integer Unit 


cror cerbD,crbA,crbB 


Reserved 





0 5 6 10 11 15 16 20 21 30 31 


CR[crbD] <— CR[crbA] | CR[crbB] 
The bit in the condition register specified by erbA is ORed with the bit in the condition 
register specified by crbB. The result is placed into the condition register bit specified by 
crbD. 
Other registers altered: 
* Condition Register: 
Affected: Bit specified by operand crbD 
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crore crore 


Condition Register OR with Complement Integer Unit 


crore crbD,crbA,crbB 


Reserved 





0 5 6 10 11 15 16 20 21 30 31 


CR{crbD] — CR[crbA] | 4CR[crbB] 
The bit in the condition register specified by crbA is ORed with the complement of the 
condition register bit specified by crbB and the result is placed into the condition register 
bit specified by crbD. 
Other registers altered: 
* Condition Register: 
Affected: Bit specified by operand crbD 
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crxor Crxor 


Condition Register XOR Integer Unit 


crxor erbD,crbA,crbB 


Reserved 





0 “i 5 6 10 11 15 16 20 21 


CR[crbD] «+ CR[crbA] ® CR[crbB] 
The bit in the condition register specified by crbA is XORed with the bit in the condition 
register specified by crbB and the result is placed into the condition register specified by 
crbD. 
Other registers altered: 
¢ Condition Register: 
Affected: Bit specified by crbD 
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dcbf dcbf 


Data Cache Block Flush Integer Unit 


dcbf rA,rB 


Reserved 





0 5 6 10 11 15 16 20 21 30 3 


EA is the sum (rAlQ)+(rB). 


The action taken depends on the memory mode associated with the target address, and on 
the state of the block. The list below describes the action taken for the various cases. The 
actions described will be executed regardless of whether the page or block containing the 
addressed byte is designated as write-through or if it is in caching-inhibited or caching 
allowed mode. 

* Coherency Required (WIM = xxl) 


— Unmodified Block—Invalidates copies of the block in the caches of all 
processors. 


— Modified Block—Copies the block to memory. Invalidates copies of the block in 
the caches of all processors. 


— Absent Block—If modified copies of the block are in the caches of other 
processors, causes them to be copied to memory and invalidated. If unmodified 
copies are in the caches of other processors, causes those copies to be 
invalidated. 

¢ Coherency Not Required (WIM = xx(Q) 

— Unmoditied Block—Invalidates the block in the processor’s cache. 

— Modified Block—Copies the block to memory. Invalidates the block in the 
processor’s cache. 

— Absent Block—Does nothing. 


This instruction operates as a load from the addressed byte with respect to address 
translation and protection. 


If EA specifies a memory address for which SR[T]=1, the instruction is treated as a no-op. 


Other registers altered: 


« None 
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dcbi dcbi 
Data Cache Block Invalidate Integer Unit 


debi rA,rB 


Reserved 





EA is the sum (rAlQ)+(rB). 


The action taken is dependent on the memory mode associated with the target, and the state 
of the block. The list below describes the action to take if the block containing the byte 
addressed by EA is or is not in the cache. The actions described must be executed regardless 
of whether the page containing the addressed byte is in caching-inhibited or caching- 
allowed mode. This is a supervisor-level instruction. 
¢ Coherency Required (WIM = xx1) 
— Unmodified Block—Invalidates copies of the block in the caches of all 
processors. 
— Modified Block—lInvalidates copies of the block in the caches of all processors. 
(Discards the modified contents.) 
— Absent Block—lIf copies are in the caches of any other processor, causes the 
copies to be invalidated. (Discards any modified contents.) 
¢ Coherency Not Required (WIM = xx0) 
— Unmodified Block—Invalidates the block in the local cache. 


— Modified Block—lInvalidates the block in the local cache. (Discards the modified 
contents.) 


— Absent Block—No action is taken. 


This instruction operates as a store to the addressed byte with respect to address translation 
and protection. The reference and change bits are modified appropriately. If EA specifies a 
memory address for which SR[T]=1, the instruction is treated as a no-op. 

Other registers altered: 


* None 
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dcbst | dcbst 


Data Cache Block Store Integer Unit 


dcbst rA,rB 


Reserved 





EA is the sum (rAl0)+(rB). 


If the block containing the byte addressed by EA is in coherency required mode, and a 
block containing the byte addressed by EA is in the data cache of any processor and has 
been modified, the writing of it to main memory is initiated. 


If the block containing the byte addressed by EA is in coherency not required mode, and a 
block containing the byte addressed by EA is in the data cache of this processor and has 
been modified, the writing of it to main memory is initiated. 


The function of this instruction is independent of the write-through and caching 
inhibited/allowed modes of the page or block containing the byte addressed by EA. 


This instruction operates as a load from the addressed byte with respect to address 
translation and protection. 


If the EA specifies a memory address for an I/O controller interface segment (segment 
register T-bit=1), the dcbst instruction operates as a no-op. 


Other registers altered: 


« None 
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dcbt debt 


Data Cache Block Touch Integer Unit 


dcbt rA,rB 


Reserved 





EA is the sum (rAl0)+(rB). 


This instruction is a hint that performance will probably be improved if the block 
containing the byte addressed by EA is fetched into the data cache, because the program 
will probably soon load from the addressed byte. Executing debt does not cause any 
exceptions to be invoked. 


This instruction operates as a load from the addressed byte with respect to address 
translation and protection except that no exception occurs in the case of a translation fault 
or protection violation. 


If the EA specifies a memory address for which SR[T]=1, the instruction is treated as a no- 
op. 


The purpose of this instruction is to allow the program to request a cache block fetch before 
it is actually needed by the program. The program can later perform loads to put data into 
registers. However, the processor is not obliged to load the addressed block into the data 
cache. If the sector is loaded, it will be either in shared state or exclusive unmodified state. 


Other registers altered: 


¢ None 
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dcbist dcbtst 


Data Cache Block Touch for Store Integer Unit 


dcbtst rA,rB 


Reserved 





EA is the sum (rAIQ)+(rB). 


This instruction is a hint that performance will probably be improved if the block 
containing the byte addressed by EA is fetched into the data cache, because the program 
will probably soon store into the addressed byte. Executing dcbtst does not cause any 
exceptions to be invoked. 


This instruction operates as load from the addressed byte with respect to address translation 
and protection, except that no exception occurs in the case of a translation fault or 
protection violation. Since dcbtst does not modify memory, it is not recorded as a store (the 
change (C) bit is not modified in the page tables). 


If the EA specifies a memory address for which SR[T]=1, the instruction is treated as a no- 
op. 

The debtst instruction behaves exactly like the debt instruction as implemented on the 
MPC601. 


The purpose of this instruction is to allow the program to schedule a cache block fetch 
before it is actually needed by the program. The program can later perform stores to put 
data into memory. However the processor is not obliged to load the addressed block into 
the data cache. 


Other registers altered: 


« None 
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dcbz dcbz 


Data Cache Block Set to Zero Integer Unit 


dcbz rA,rB 
[POWER mnemonic: delz] 


[2] Reserved 





5 6 10 11 15 16 20 21 30 31 


EA is the sum (rAI0)+(rB). 


If the block containing the byte addressed by EA is in the data cache, all bytes of the block 
are cleared to zero. 


If the block containing the byte addressed by EA is not in the data cache and the 
corresponding page is caching allowed, the block is allocated in the data cache without 
fetching the block from main memory, and all bytes of the block are set to zero. 


If the page containing the byte addressed by EA is caching inhibited or write-through, then 
the alignment exception handler is invoked and the handler should clear to zero all bytes of 
the area of memory that corresponds to the addressed block. If the block containing the byte 
addressed by EA is in coherency required mode, and the block exists in the data cache(s) 
of any other processor(s), it is kept coherent in those caches. 


This instruction is treated as a store to the addressed byte with respect to address translation 
and protection. 


If the EA specifies a memory address for an I/O controller interface segment (segment 
register T-bit=1), the dcbz instruction is treated as a no-op. 


See Chapter 5, “Exceptions” for a discussion about a possible delayed machine check 
exception that can occur by use of dcbz if the operating system has set up an incorrect 
memory mapping. 


Other registers altered: 


« None 





10-44 PowerPC 601 RISC Microprocessor User's Manual MOTOROLA 


divx POWER Architecture Instruction divx 


Divide Integer Unit 

div rD,rA,rB (OE=0 Rc=0) 

div. rD,rA,rB (OE=0 Rc=1) 

divo rD,rA,rB (OE=1 Rc=0) 

divo. rD,rA,rB (OE=1 Rc=1) 

aa er ee ee 
10 11 15 16 20 21 22 30 31 


This instruction is not part of the PowerPC architecture. 


The quotient [(rA)II((MQ)]+(rB) is placed into rD. The remainder is placed in the MQ 
register. The remainder has the same sign as the dividend, except that a zero quotient or a 
zero remainder is always positive. The results obey the equation: 


dividend=(divisor x quotient)+remainder 


where dividend is the original (rA)II(MQ), divisor is the original (rB), quotient is the final 
(rD), and remainder is the final (MQ). 


If Rc=1, then CR bits LT, GT, and EQ reflect the remainder. If OE=1, then SO and OV are 
set to one if the quotient cannot pe represented in 32 bits. For the case of -231 + -1, the MQ 
register is cleared to zero and 298" is placed in rD. For all other overflows, MQ, rD and the 
CRO field are undefined Gif Rc=1). 


Other registers altered: 
¢ Condition Register (CRO Field): 


Affected: LT, GT, EQ, SO dif Rc=1) 
* XER: 
Affected: SO, OV (if OE=1) 


Note: This instruction is specific to the MPC601. 
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divsx POWER Architecture Instruction divsx 


Divide Short Integer Unit 

divs rD,rA,rB (OE=0 Rc=0) 

divs. rD,rA,rB (OE=0 Rc=1) 

divso rD,rA,rB (OE=1 Rc=0) 

divso. rD,rA,rB (OE=1 Rc=1) 

aE ae ee 
10 11 15 16 20 21 22 30 31 


This instruction is not part of the PowerPC architecture. 


The quotient (rA)+(rB) is placed into rD. The remainder is placed in the MQ register. The 
remainder has the same sign as the dividend, except that a zero quotient or a zero remainder 
is always positive. The results obey the equation: 


dividend=(divisor*quotient)+remainder 


where dividend is the original rA, divisor is the original rB, quotient is the final rD, and 
remainder is the final MQ. 


If Rc=1 then the CR bits LT, GT, and EQ reflect the remainder. If OE=1,then SO and OV 
are set to one if the quotient cannot Se represented in 32 bits (e.g., as is the case when the 
divisor is cach or the dividend is -23" and the divisor is -1), the MQ register is cleared to 
zero and -2°" is placed in rD. For all other overflows, MQ, rD and the CRO field (if Rc=1) 
are undefined. 


Other registers altered: 
* Condition Register (CRO Field): 


Affected: LT, GT, EQ, SO (if Rc=1) 
¢ XER: 
Affected: SO, OV (If OE=1) 


Note: This instruction is specific to the MPC601. 
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divwx divwx 


Divide Word Integer Unit 
divw rD,rA,rB (OE=0 Rc=0) 
divw. rD,rA,rB (OE=0 Rc=1) 
divwo rD,rA,rB (OE=1 Rc=0) 
divwo. rD,rA,rB (OE=1 Rc=1) 
Hee ee ee ae ee 
10 11 15 16 20 21 22 30 31 
dividend <(rA) 


divisor —(rB) 
rD € dividend + divisor 


Register rA is the 32-bit dividend. Register rB is the 32-bit divisor. A 32-bit quotient is 
formed and placed into rD. The remainder is not supplied as a result. 


Both operands are interpreted as signed integers. The quotient is the unique signed integer 
that satisfies the following: 


dividend=(quotient times divisor)+r 
where 
Os r < Idivisorl 
if the dividend is non-negative, and 
-ldivisorl <r <0 
if the dividend is negative. 
If an attempt is made to perform any of the divisions 
x78000 0000’ / -1 
<anything> / 0 


then the contents of rD are undefined as are (if Rc=1) the contents of the LT, GT, and EQ 
bits of the CRO field. In these cases, if OE=1 then OV is set to 1. 
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Other registers altered: 
¢ Condition Register (CRO Field): 


Affected: LT, GT, EQ, SO (if Rc=1) 
¢ XER: 
Affected: SO, OV (if OE=1) 


The 32-bit signed remainder of dividing rA by rB can be computed as follows, except in 
the case that rA=-23! and rB=-1. 


divw rD,rA,rB # rD=quotient 
mull rD,rD,rB # rD=quotient*divisor 
subf rD,rD,rA # rD=remainder 
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divwux divwux 


Divide Word Unsigned Integer Unit 
divwu rD,rA,rB (OE=0 Rc=0) 
divwu. rD,rA,rB (OE=0 Rc=1) 
divwuo rD,rA,rB (OE=1 Rc=() 
divwuo. rD,rA,rB (OE=1 Rc=1) 
Fa ae a ee 
10 11 15 16 20 21 22 30 31 
dividend < (rA) 


divisor — (rB) 
rD € dividend + divisor 


The dividend is the contents of rA. The divisor is the contents of rB. A 32-bit quotient is 
formed and placed into rD. The remainder is not supplied as a result. 


Both operands are interpreted as unsigned integers. The quotient is the unique unsigned 
integer that satisfies the following: 


dividend=(quotient * divisor)+r 
where 
Os r < divisor. 


If an attempt is made to divide by zero, the contents of rD are undefined as are (if Rc=1) 
the contents of the LT, GT, and EQ bits of the CRO field. In this case, if OE=1 then OV is 
set to 1. 

Other registers altered: 


* Condition Register (CRO Field): 


Affected: LT, GT, EQ, SO (if Rc=1) 
« XER: 
Affected: SO, OV (if OE=1) 


The 32-bit signed fomanger of dividing rA by rB can be computed as follows, except in 
the case that rA=-29! and rB=-1. 


divwu rD,rA,rB # rD=quotient 
mull rD,rD,rB # rD=quotient*divisor 
subf rD,rD,rA # rD=remainder 
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dozx POWER Architecture Instruction dozx 


Difference or Zero Integer Unit 

doz rD,rA,rB (OE=0 Rc=0) 

doz. rD,rA,rB (OE=0 Rc=1) 

dozo rD,rA,rB (OE=1 Rc=0) 

dozo. rD,rA,rB (OE=1 Rc=1) 

a a ee 
10 11 15 16 20 21 22 30 31 


This instruction is not part of the PowerPC architecture. 


The sum 7(rA)+(rB) +1 is placed into rD. If the value in rA is algebraically greater than 
the value in rB, rD is set to zero. If Rc=1, the CRO field is set to reflect the result placed in 
rD (i.e., if rD is set to zero, EQ is set to 1). If OE=1, OV can only be set on positive 
overflows. 


Other registers altered: 
* Condition Register (CRO Field): 


Affected: LT, GT, EQ, SO (if Rc=1) 
° XER: 
Affected: SO, OV (if OE=1) 


Note: This instruction is specific to the MPC601. 
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dozi POWER Architecture Instruction dozi 
Difference or Zero Immediate Integer Unit 


dozi rD,rA,SIMM 


[ae ae eee ee ee si 


0 5 6 10 11 15 16 31 
This instruction is not part of the PowerPC architecture. 


The sum —(rA)+SIMM +1 is placed into rD. If the value in rA is algebraically greater than 
the value of the SIMM field, rD is set to zero. 


Other registers altered: 


« None 


Note: This instruction is specific to the MPC601. 
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eciwx eciwx 


External Control Input Word Indexed Integer Unit 


eciwx rD,rA,rB 


Reserved 





if rA=0 then b — 0 
else b < (rA) 
EA <— b+(rB) 
if EAR[E]=1 then 
paddr < address translation of EA 
send load request for paddr to device identified by EAR[RID] 
rD € word from device 
else 
DSISR[11] <— 1 
generate data access exception 


EA is the sum (rAI0)+(rB). 


If EAR[E]=1, a load request for the physical address corresponding to EA is sent to the 
device identified by EAR[RID], bypassing the cache. The word returned by the device is 
placed in rD. The EA sent to the device must be word aligned. 


If EAR[E]=0, a data access exception is taken, with bit 11 of DSISR set to 1. 


The eciwx instruction is supported for effective addresses that reference ordinary 
(SR[T]=0) segments, for EAs mapped by the BAT registers, and for EAs generated when 
MSR[DT]=0 (direct translation). The instruction is treated as a no-op for EAs that 
correspond to I/O controller interface (SR[T]=1) segments. 


The access caused by this instruction is treated as a load from the location addressed by EA 
with respect to protection and reference and change recording. 


This instruction is defined as an optional instruction by the PowerPC architecture, and may 
not be available in all PowerPC implementations. 


Other registers altered: 


¢ None 
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ecowx ecowx 


External Control Output Word Indexed Integer Unit 


ecowx rS,rA,rB 


Reserved 





if rA=0 then b — 0 
else b ¢ (rA) 
EA ¢ b+(rB) 
if EAR[E]=1 then 
paddr < address translation of EA 
send store request for paddr to device identified by EAR[RID] 
send rS to device 
else 
DSISR[{11] — 1 
generate data access exception 


EA is the sum (rAl0)+(rB). 


If EAR[E]=1, a store request for the physical address corresponding to EA and the contents 
of rS are sent to the device identified by EAR[RID], bypassing the cache. The EA sent to 
the device must be word aligned. 


If EAR[E]=0, a data access exception is taken, with bit 11 of DSISR set to 1. 


The ecowx instruction is supported for effective addresses that reference ordinary 
(SR[T]=0) segments, for EAs mapped by the BAT registers, and for EAs generated when 
MSR[DT]=0 (direct translation). The instruction is treated as a no-op for EAs that 
correspond to I/O controller interface (SR[T]=1) segments. The access caused by this 
instruction is treated as a store to the location addressed by EA with respect to protection 
and reference and change recording. 


This instruction is defined as an optional instruction by the PowerPC architecture, and may 
not be available in all PowerPC implementations. 


Other registers altered: 


« None 
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eleio eleio 


Enforce In-Order Execution of I/O Integer Unit 


Reserved 





0 5 6 10 11 15 16 20 21 30 31 


The eieio instruction provides an ordering function for the effects of load and store 
instructions executed by a given processor. Executing an eieio instruction ensures that all 
memory accesses previously initiated by the given processor are complete with respect to 
main memory before any memory accesses subsequently initiated by the given processor 
access main memory. 


The synchronize (sync) and the enforce in-order execution of I/O (eieio) instructions are 
handled in the same manner internally to the MPC601. These instructions delay execution 
of subsequent instructions until all previous instructions have completed to the point that 
they can no longer cause an exception, all previous memory accesses are performed 
globally, and the sync or eieio operation is broadcast onto the MPC6()1 bus interface. 


eieio orders loads/stores to caching inhibited memory and stores to write-through required 
memory. 


Other registers altered: 


« None 


The eieio instruction is intended for use only in performing memory-mapped I/O 
operations and to prevent load/store combining operations in main memory. It can be 
thought of as placing a barrier into the stream of memory accesses issued by a processor, 
such that any given memory access appears to be on the same side of the barrier to both the 
processor and the I/O device. 


The eieio instruction may complete before previously initiated memory accesses have been 
performed with respect to other processors and mechanisms. 
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eqvx eqvx 


Equivalent Integer Unit 
eqv rA,rS,rB (Rc=0) 
eqv. rA,rS,rB (Rc=1) 
Fe Re NE 
10 11 15 16 21 22 30 31 
rA — —((rS) =(rB)) 


The contents of rS is XORed with the contents of rB and the complemented result is placed 
into rA. 
Other registers altered: 
¢ Condition Register (CRO Field): 
Affected: LT, GT, EQ, SO (if Rc=1) 
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extsbx extsbx 


Extend Sign Byte Integer Unit 
extsb rA,rS (Rc=0) 
extsb. rA,rs (Rc=1) 

Reserved 





0 5 6 10 11 15 16 20 21 30 31 


S — rS[24] 
rA[24—-31] <— rS[24-31] 
rA[0—23] < (24)S 


The contents of rS[24~—31] are placed into rA[24—31]. Bit 24 of rS is placed into rA[Q—23]. 


Other registers altered: 
* Condition Register (CRO Field): 
Affected: LT, GT, EQ, SO (if Rc=1) 
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extshx extshx 


Extend Sign Half Word Integer Unit 
extsh rA,rS (Rce=0) 
extsh. rA,rS (Rc=1) 


[POWER mnemonics: exts, exts. ] 


Reserved 





0 5 6 10 11 15 16 20 21 30 31 


S<rs[16] 
rA[ 16-31} rS[16-31] 
rA[O-15] — (16)S 


The contents of rS[16-31] are placed into rA[ 16-31]. Bit 16 of rS is placed into rA[{Q—15]. 
Other registers altered: 
¢ Condition Register (CRO Field): 
Affected: LT, GT, EQ, SO (if Rc=1) 
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fabsx fabsx 


Floating-Point Absolute Value - Floating-Point Unit 
fabs frD,frB (Rc=(0) 
fabs. frD,frB (Rc=1) 

Reserved 





0 5 6 10 11 15 16 20 21 30 31 


The contents of frB with bit 0 cleared to zero is placed into frD. 
Other registers altered: 
¢ Condition Register (CR1 Field): 
Affected: FX, FEX, VX, OX (if Rc=1) 
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faddx faddx 


Floating-Point Add (Single-Precision) Floating-Point Unit 
fadd frD,frA,frB (Rc=()) 
fadd. frD,frA,frB (Rc=1) 


[POWER mnemonics: fa, fa. ] 


Reserved 





fadds frD,frA,frB (Rce=0) 
fadds. frD,frA,frB (Rc=1) 


Reserved 





0 5 6 10 11 15 16 20 21 25 26 30 31 


The floating-point operand in frA is added to the floating-point operand in frB. If the most 
significant bit of the resultant significand is not a one, the result is normalized. The result 
is rounded to the target precision under control of the floating-point rounding control field 
RN of the FPSCR and placed into frD. 


Floating-point addition is based on exponent comparison and addition of the two 
significands. The exponents of the two operands are compared, and the significand 
accompanying the smaller exponent is shifted right, with its exponent increased by one for 
each bit shifted, until the two exponents are equal. The two significands are then added 
algebraically to form an intermediate sum. All 53 bits in the significand as well as all three 
guard bits (G, R, and X) enter into the computation. 


If a carry occurs, the sum's significand is shifted right one bit position and the exponent is 
increased by one. FPSCR[FPRF] is set to the class and sign of the result, except for invalid 
operation exceptions when FPSCR[VE]=1. 
Other registers altered: 
* Condition Register (CR1 Field): 
Affected: FX, FEX, VX, OX (if Rc=1) 
¢ Floating-point Status and Control Register: 
Affected: FPRF, FR, FI, FX, OX, UX, XX, VXSNAN, VXISI 
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fcmpo fcmpo 


Floating-Point Compare Ordered Floating-Point Unit 
fempo crfD,frA,frB 


Reserved 





0 5 6 8 9 10 11 15 16 20 21 30 31 


The floating-point operand in frA is compared to the floating-point operand in frB. The 
result of the compare is placed into CR Field erfD and the FPCC. 


If one of the operands is a NaN, either quiet or signaling, then CR Field erfD and the FPCC 
are set to reflect unordered. If one of the operands is a signaling NaN, then VXSNAN is set, 
and if invalid operation is disabled (VE=0) then VXVC is set. Otherwise, if one of the 
operands is a QNaN then VXVC is set. 
Other registers altered: 

¢ Condition Register (CR Field specified by operand erfD): 


Affected: FPCC, FX, VXSNAN, VXVC 
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fempu fempu 


Floating-Point Compare Unordered Floating-Point Unit 


fempu erfD,frA,frB 


Reserved 





0 5 6 8 9 10 11 15 16 20 21 30 31 


The floating-point operand in register frA is compared to the floating-point operand in 
register frB. The result of the compare is placed into CR Field crfD and the FPCC. 


If one of the operands is a NaN, either quiet or signaling, then CR Field crfD and the FPCC 
are set to reflect unordered. If one of the operands is a signaling NaN, then VXSNAN is set. 


Other registers altered: 
* Condition Register (CR Field specified by operand erfD): 
Affected: FPCC, FX, VXSNAN 
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fctiwx fctiwx 


Floating-Point Convert to Integer Word Floating-Point Unit’ 
fctiw frD,frB (Rc=0) 
fctiw. frD,frB (Rc=1) 

Reserved 





0 5 6 10 11 15 16 20 21 30 31 


The floating-point operand in register frB is converted to a 32-bit signed integer, using the 
rounding mode specified by FPSCR[RN], and placed in bits 32-63 of frD. Bits 0-31 of frD 
are undefined. 


If the contents of frB is greater than 31.4 , bits 32-63 of frD are set to x '7FFF_FFFF'’. 
If the contents of frB is less than 291 bits 32-63 of frD are set to x '8000_0000 '. 


The conversion is described fully in Appendix F.2, “Conversion from Floating-Point 
Number to Unsigned Fixed-Point Integer Word.” 


Except for trap-enabled invalid operation exceptions, FPSCR[FPRF] is undefined. 
FPSCR[FR] is set if the result is incremented when rounded. FPSCR[FI] is set if the result 
is inexact. 
Other registers altered: 
* Condition Register (CR1 Field): 
Affected: FX, FEX, VX, OX (if Rc=1) 
¢ Floating-point Status and Control Register: 
Affected: FPRF (undefined), FR, FI, FX, XX, VXSNAN, VXCVI 
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fctiwzx fctiwzx 


Floating-Point Convert to Integer Word with Round toward Zero Floating-Point Unit 
fctiwz frD,frB (Rce=() 
fctiwz. frD,frB (Rc=1) 

Reserved 





0 5 6 10 11 15 16 20 21 30 31 


The floating-point operand in register frB is converted to a 32-bit signed integer, using the 
rounding mode round toward zero, and placed in bits 32-63 of frD. Bits 0-31 of frD are 
undefined. 


If the operand in frB is greater than ys bits 32-63 of frD are set to x '7FFF_FFFF '. 
If the operand in frB is less than -2°', bits 32-63 of frD are set to x '8000_0000'. 


The conversion is described fully in Appendix F.2, “Conversion from Floating-Point 
Number to Unsigned Fixed-Point Integer Word.” 


Except for trap-enabled invalid operation exceptions, FPSCR[FPRF] is undefined. 
FPSCR[FR] is set if the result is incremented when rounded. FPSCR[FI] is set if the result 
is inexact. 
Other registers altered: 
* Condition Register (CR1 Field): 
Affected: FX, FEX, VX, OX (if Rc=1) 
¢ Floating-point Status and Control Register: 
Affected: FPRF (undefined), FR, FI, FX, XX, VXSNAN, VXCVI 
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fdivx fdivx 


Floating-Point Divide (Single-Precision) Floating-Point Unit 
fdiv frD,frA,frB (Rc=0) 
fdiv. frD,frA,frB (Rc=1) 


[POWER mnemonics: fd, fd.] 


Reserved 





0 5 6 10 11 15 16 20 21 25 26 30 31 


fdivs frD,frA,frB (Rc=0) 
fdivs. frD,frA,frB (Rc=1) 


Reserved 





0 5 6 10 11 15 16 20 21 25 26 30 31 


The floating-point operand in register frA is divided by the floating-point operand in 
register frB. No remainder is preserved. 


If an operand is a denormalized number then it is prenormalized before the operation is 
started. If the most significant bit of the resultant significand is not a one the result is 
normalized. The result is rounded to the target precision under control of the floating-point 
rounding control field RN of the FPSCR and placed into frD. 


Floating-point division is based on exponent subtraction and division of the significands. 


FPSCR[FPRF] is set to the class and sign of the result, except for invalid operation 
exceptions when FPSCR[VE]=1 and zero divide exceptions when FPSCR[ZE]=1. 


Other registers altered: 
* Condition Register (CR1 Field): 
Affected: FX, FEX, VX, OX (if Rc=1) 
¢ Floating-point Status and Control Register: 
Affected: FPRF, FR, FI, FX, OX, UX, ZX, XX, VXSNAN, VXIDI, VXZDZ 
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fmaddx fmaddx 


Floating-Point Multiply-Add (Single-Precision) Floating-Point Unit 
fmadd frD,frA,frC,frB (Re=0) 
fmadd. _ frD,frA,frC,frB (Rc=1) 


[POWER mnemonics: fma, fma.] 


10 11 15 16 20 21 25 26 30 31 
fmadds _ frD,frA,frC,frB (Rc=0) 
fmadds.  frD,frA,frC,frB (Rc=1) 
a ee ee 
0 5 6 10 11 15 16 20 21 25 26 30 31 


The following operation is performed: 


frD <— [(frA)*(frC)]+(frB) 
The floating-point operand in register frA is multiplied by the floating-point operand in 
register frC. The floating-point operand in register frB is added to this intermediate result. 


If an operand is a denormalized number then it is prenormalized before the operation is 
started. If the most significant bit of the resultant significand is not a one the result is 
normalized. The result is rounded to the target precision under control of the floating-point 
rounding control field RN of the FPSCR and placed into frD. 


FPSCR[FPRF] is set to the class and sign of the result, except for invalid operation 
exceptions when FPSCR[VE]=1. 
Other registers altered: 
¢ Condition Register (CR1 Field): 
Affected: FX, FEX, VX, OX (if Rc=1) 
¢ Floating-point Status and Control Register: 
Affected: FPRF, FR, FI, FX, OX, UX, XX, VXSNAN, VXISI, VXIMZ 
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fmrx fmrx 


Floating-Point Move Register Floating-Point Unit 
fmr frD,frB (Rc=0) 
fmr. frD,frB (Rc=1) 

Reserved 





0 5 6 10 11 15 16 20 21 30 31 


The contents of register frB is placed into frD. 


Other registers altered: 
* Condition Register (CR1 Field): 
Affected: FX, FEX, VX, OX (if Rc=1) 
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fmsubx fmsubx 


Floating-Point Multiply-Subtract (Single-Precision) Floating-Point Unit 
fmsub frD,frA,frC,frB (Rc=0) 
fmsub. _ frD,frA,frC,frB (Rc=1) 


{POWER mnemonics: fms, fms.] 


0 5 6 10 11 15 16 20 21 25 26 30 31 
fmsubs___ frD,frA,frC,frB (Rc=0) 
fmsubs.  frD,frA,frC,frB (Rc=1) 
eae ee a a ee 
0 5 6 10 11 15 16 20 21 25 26 30 31 


The following operation is performed: 


frD — [(frA)*(frC)] - (frB) 


The floating-point operand in register frA is multiplied by the floating-point operand in 
register frC. The floating-point operand in register frB is subtracted from this intermediate 
result. 


If an operand is a denormalized number then it is prenormalized before the operation is 
started. If the most significant bit of the resultant significand is not a one the result is 
normalized. The result is rounded to the target precision under control of the floating-point 
rounding control field RN of the FPSCR and placed into frD. 


FPSCR[FPRF] is set to the class and sign of the result, except for invalid operation 
exceptions when FPSCR[VE]=1. 
Other registers altered: 
* Condition Register (CR1 Field): 
Affected: FX, FEX, VX, OX (if Rc=1) 
¢ Floating-point Status and Control Register: 
Affected: FPRF, FR, FI, FX, OX, UX, XX, VXSNAN, VXISI, VXIMZ 





MOTOROLA Chapter 10. Instruction Set 10-67 


fmulx fmulx 


Floating-Point Multiply (Single-Precision) Floating-Point Unit 
fmul frD,frA,frC (Rc=0) 
fmul. frD,frA,frC (Rc=1) 


[POWER mnemonics: fm, fm.] 


Reserved 








0 5 6 10 11 25 26 30 31 
fmuls frD,frA,frC (Rc=0) 
fmuls. frD,frA,frC (Rc=1) 





The floating-point operand in register frA is multiplied by the floating-point operand in 
register frC. 


If an operand is a denormalized number then it is prenormalized before the operation is 
started. If the most significant bit of the resultant significand is not a one the result is 
normalized. The result is rounded to the target precision under control of the floating-point 
rounding control field RN of the FPSCR and placed into frD. 


Floating-point multiplication is based on exponent addition and multiplication of the 
significands. 


FPSCR[FPRF] is set to the class and sign of the result, except for invalid operation 
exceptions when FPSCR[VE]=1. 


Other registers altered: 
* Condition Register (CR1 Field): 
Affected: FX, FEX, VX, OX (if Rc=1) 
* Floating-point Status and Control Register: 
Affected: FPRF, FR, FI, FX, OX, UX, XX, VXSNAN, VXIMZ 
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fnabsx fnabsx 


Floating-Point Negative Absolute Value Floating-Point Unit 
fnabs frD,frB (Rce=0) 
fnabs. frD,frB (Rc=1) 

Reserved 





The contents of register frB with bit () set to one is placed into frD. 


Other registers altered: 
¢ Condition Register (CR1 Field): 
Affected: FX, FEX, VX, OX (if Rc=1) 
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fnegx fnegx 


Floating-Point Negate Floating-Point Unit 
fneg frD,frB (Rc=0) 
fneg. frD,frB (Rc=1) 

Reserved 





The contents of register frB with bit 0 inverted is placed into frD. 
Other registers altered: 
¢ Condition Register (CR1 Field): 
Affected: FX, FEX, VX, OX (if Rc=1) 
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fnmaddx fnmaddx 


Floating-Point Negative Multiply-Add (Single-Precision) Floating-Point Unit 
fnmadd = frD,frA,frC,frB (Rc=0) 
fnmadd. frD,frA,frC,frB (Rc=1) 


[POWER mnemonics: fnma, fnma. ] 


0 5 6 10 11 15 16 20 21 25 26 30 31 
fnmadds_ frD,frA,frC,frB (Rc=0) 
fnmadds. frD,frA,frC,frB (Rc=1) 
[ee ee ee aac 
0 5 6 10 11 15 16 20 21 25 26 30 31 


The following operation is performed: 


frD <— -({(frA)*(frC)]+(frB)) 


The floating-point operand in register frA is multiplied by the floating-point operand in 
register frC. The floating-point operand in register frB is added to this intermediate result. 
If an operand is a denormalized number then it is prenormalized before the operation is 
started. If the most significant bit of the resultant significand is not a one the result is 
normalized. The result is rounded to the target precision under control of the floating-point 
rounding control field RN of the FPSCR, then negated and placed into frD. 


This instruction produces the same result as would be obtained by using the floating-point 
multiply-add instruction and then negating the result, with the following exceptions: 
¢ QNaNs propagate with no effect on their sign bit. 


¢ QNaNs that are generated as the result of a disabled invalid operation exception 
have a sign bit of zero. 


¢ SNaNs that are converted to QNaNs as the result of a disabled invalid operation 
exception retain the sign bit of the SNaN. 


FPSCR[FPRF] is set to the class and sign of the result, except for invalid operation 
exceptions when FPSCR[VE]=1. 
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Other registers altered: 
¢ Condition Register (CR1 Field): 
Affected: FX, FEX, VX, OX (if Rc=1) 
¢ Floating-point Status and Control Register: 
Affected: FPRF, FR, Fl, FX, OX, UX, XX, VXSNAN, VXISI, VXIMZ 
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fnmsubx fnmsubx 


Floating-Point Negative Multiply-Subtract (Single-Precision) Floating-Point Unit 
fnmsub _ frD,frA,frC,frB (Rc=(0) 
fnmsub.  frD,frA,frC,frB (Rc=1) 


[POWER mnemonics: fnms, fnms. 


0 5 6 10 11 15 16 20 21 25 26 30 31 
fnmsubs_ frD,frA,frC,frB (Rc=0) 
fnmsubs. frD,frA,frC,frB (Rc=1) 
0 5 6 10 11 15 16 20 21 25 26 30 31 


The following operation is performed: 


frD <— -({((frA)*(frC)] - (frB)) 


The floating-point operand in register frA is multiplied by the floating-point operand in 
register frC. The floating-point operand in register frB is subtracted from this intermediate 
result. 


If an operand is a denormalized number, it is prenormalized before the operation is started. 
If the most significant bit of the resultant significand is not one, the result is normalized. 
The result is rounded to the target precision under control of the floating-point rounding 
control field RN or the FPSCR, then negated and placed into frD. 


This instruction produces the same result obtained by negating the result of a floating 
multiply-subtract instruction with the following exceptions: 
* QNaNs propagate with no effect on their sign bit. 


¢ QNaNs that are generated as the result of a disabled invalid operation exception 
have a sign bit of zero. 


¢ SNaNs that are converted to QNaNs as the result of a disabled invalid operation 
exception retain the sign bit of the SNaN. 


FPSCR[FPRF] is set to the class and sign of the result, except for invalid operation 
exceptions when FPSCR[VE]=1. 


cannes nanan baapesebnenaenaehaennsaehesasanasannasea nana asganaasassenscasennentes 
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Other registers altered: 
¢ Condition Register (CRI Field) — 
Affected: FX, FEX, VX, OX (if Rc=1) 
¢ Floating-point Status and Control Register: 
Affected: FPRF, FR, FI, FX, OX, UX, XX, VXSNAN, VXISI, VXIMZ 


10-74 PowerPC 601 RISC Microprocessor User's Manual MOTOROLA 


frspx | frspx 


Floating-Point Round to Single-Precision Floating-Point Unit 
frsp frD,frB (Rc=0) 
frsp. frD,frB (Rc=1) 

Reserved 





0 5 6 10 11 15 16 20 21 30 31 


If it is already in single-precision range, the floating-point operand in register frB is placed 
into frD. Otherwise the floating-point operand in register frB is rounded to single-precision 
using the rounding mode specified by FPSCR[RN] and placed into frD. 


The rounding is described fully in Appendix F.1, “Conversion from Floating-Point Number 
to Signed Fixed-Point Integer Word.” 


FPSCR[FPRF] is set to the class and sign of the result, except for invalid operation 
exceptions when FPSCR[VE]=1. 
Other registers altered: 
* Condition Register (CR1 Field): 
Affected: FX, FEX, VX, OX (if Rc=1) 
¢ Floating-point Status and Control Register: 
Affected: FPRF, FR, FI, FX, OX, UX, XX, VXSNAN 
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fsubx _ fsubx 


Floating-Point Subtract (Single-Precision) Floating-Point Unit 
fsub frD,frA,frB (Rc=0) 
fsub. frD,frA,frB (Rc=1) 


[POWER mnemonics: fs, fs. ] 





Reserved 
0 5 6 10 11 15 16 20 21 25 26 30 31 
fsubs frD,frA,frB (Rc=() 
fsubs. frD,frA,frB (Rc=1) 





0 5 6 10 11 15 16 20 21 25 26 30 31 


The floating-point operand in register frB is subtracted from the floating-point operand in 
register frA. If the most significant bit of the resultant significand is not a one the result is 
normalized. The result is rounded to the target precision under control of the floating-point 
rounding control field RN of the FPSCR and placed into frD. 


The execution of the floating-point subtract instruction is identical to that of floating-point 
add, except that the contents of frB participates in the operation with its sign bit (bit 0) 
inverted. 


FPSCR[FPRF] is set to the class and sign of the result, except for invalid operation 
exceptions when FPSCR[VE]=1. 
Other registers altered: 
¢ Condition Register (CR1 Field): 
Affected: FX, FEX, VX, OX (if Rc=1) 
¢ Floating-point Status and Control Register: 
Affected: FPRF, FR, FI, FX, OX, UX, XX, VXSNAN, VXISI 
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icbi icbi 
Instruction Cache Block Invalidate Integer Unit 


icbi rA,rB 


Reserved 





EA is the sum (rAl0)+(rB) 


In other PowerPC processors, if the block containing the byte addressed by EA is in 
coherency required mode, and a block containing the byte addressed by EA is in the 
instruction cache of any processor, the block is made invalid in all such processors, so that 
subsequent references cause the block to be refetched. 


Also, if the block containing the byte addressed by EA is in coherency not required mode, 
and a block containing the byte addressed by EA is in the instruction cache of this 
processor, the block is made invalid in this processor, so that subsequent references cause 
the block to be fetched from main memory (or perhaps from a data cache). 


The MPC601 treats the icbi instruction as a no-op, even to the extent of not validating the 
EA. 


Other registers altered: 


« None 
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isync isync 
Instruction Synchronize Integer Unit 


isynce 


[POWER mnemonic: ics] 


Reserved 






0 56 1011 1516 20 21 30 31 


This instruction waits for all previous instructions to complete and then discards any 
prefetched instructions, causing subsequent instructions to be fetched (or refetched) from 
memory and to execute in the context established by the previous instructions. This 
instruction has no effect on other processors or on their caches. 


This instruction is context synchronizing. 
Other registers altered: 


¢ None 
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Ibz lbz 


Load Byte and Zero Integer Unit 
Ibz rD,d(rA) 
eet oe ae ee ne ee el 
0 5 6 10.11 15 16 31 
if rA=0 then b — 0 
else b © (rA) 
EA < b+EXTS(d) 


rD < (24)0 Il MEM(EA, 1) 
The effective address is the sum (rAl0) + d. The byte in memory addressed by EA is loaded 
into rD[24-31]. Bits rD[0—23] are cleared to 0. 
Other registers altered: 


« None 
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Ibzu Ibzu 


Load Byte and Zero with Update Integer Unit 
Ibzu rD,d(rA) 


oa ee ae ee ee ee eee 
0 


5 6 10 11 15 16 31 


if rA=0 then b — 0 
else b — (rA) 
EA < b+EXTS(d) 
rDe(24)0 | MEM(BA, 1) 
rAcEA 
EA is the sum (rAlQ) + d. The byte in memory addressed by EA is loaded into rD[24—31]. 


Bits rD{0-23] are cleared to 0). 
EA is placed into rA. 


If operand rA=0 the MPC601 does not update register r0, or if rA=rD the load data is 
loaded into register rD and the register update is suppressed. The PowerPC architecture 
defines load with update instructions with operand rA=0 or rA=rD as invalid forms 


Other registers altered: 


* None 
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Ibzux [bzuXx 


Load Byte and Zero with Update Indexed Integer Unit 


Ibzux rD,rA,rB 


Reserved 





0 5 6 10 11 15 16 20 21 30 31 


if rA=0 then b — 0 

else b ¢ (rA) 

EA € b+(rB) 

rD — (24)0 ll MEM(EA, 1) 

rA<—EA 
EA is the sum (rAlQ) + (rB). The byte addressed by EA is loaded into rD[24-31]. Bits 
rD[Q—23] are set to 0). 


EA is placed into rA. 


If operand rA=(0 the MPC601 does not update register r0, or if rA=rD the load data is 
loaded into register rD and the register update is suppressed. The PowerPC architecture 
defines load with update instructions with operand rA=0 or rA=rD as invalid forms 


Other registers altered: 


« None 
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Ibzx Ibzx 


Load Byte and Zero Indexed ~ Integer Unit 


Ibzx rD,rA,rB 


Reserved 





0 5 6 10 11 15 16 20 21 


if rA=0 then b — 0 

else b € (rA) 

EA © b+(rB) 

rD <— (24)0 Il MEM(EA, 1) 


EA is the sum (rAlQ) + (rB). The byte in memory addressed by EA is loaded into 
rD[24-31]. 


Bits rD[Q—23] are set to 0. 


Other registers altered: 


¢ None 
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lfd lfd 


Load Floating-Point Double-Precision Integer Unit and 


Floating-Point Unit 


Ifd frD,d(rA) 
EE ae (a | ee ee ee 
0 5 6 10 11 15 16 31 


if rA=0 then h — 0 
else b | (rA) 
EA — b+EXTS(d) 

frD — MEM(EA, 8) 


EA is the sum (rAlO) + d. 
The double word in memory addressed by EA is placed into frD. 


Other registers altered: 


e None 
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lfdu lfdu 


Load Floating-Point Double-Precision with Update Integer Unit and 


Floating-Point Unit 


Ifdu frD,d(rA) 
a ea a ee ee eee 
0 5 6 10 11 15 16 31 
if rA=0 then bh — 0 
else b — (rA) 
EA — b+EXTS(d) 
frD — MEM(EA, 8) 
rA<EA 


EA is the sum (rAlQ) + d. 
The double word in memory addressed by EA is placed into frD. 


EA is placed into rA. 


If operand rA=0 the MPC601 does not update register r0. The PowerPC architecture 
defines load with update instructions with operand rA=0 as an invalid form. 


Other registers altered: 


* None 
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lfdux lfdux 


Load Floating-Point Double-Precision with Update Indexed Integer Unit and 
Floating-Point Unit 


Ifdux frD,rA,rB 


Reserved 





0 5 6 10 11 15 16 20 21 30 31 


if rA=0 then bh © 0 
else b <— (rA) 
EA < b+(rB) 

frD — MEM(EA, 8) 
rA< EA 


EA is the sum (rAlQ) + (rB). 
The double word in memory addressed by EA is placed into frD. 
EA is placed into rA. 


If operand rA=0 the MPC601 does not update register r0. The PowerPC architecture 
defines load with update instructions with operand rA=0 as an invalid form. 
Other registers altered: 


« None 
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lfdx lfdx 


Load Floating-Point Double-Precision Indexed Integer Unit and 
Floating-Point Unit 
Ifdx frD,rA,rB 


Reserved 





0 5 6 10 11 15 16 20 21 30 31 


if rA=() then b — 0 
else b + (rA) 
EA — b+(rB) 

frD — MEM(EA, 8) 


EA is the sum (rAl0) + (rB). 
The double word in memory addressed by EA is placed into frD. 


Other registers altered: 


¢ None 
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lfs — |fs 


Load Floating-Point Single-Precision Integer Unit and 


Floating-Point Unit 


Ifs frD,d(rA) 
a ee ee ee ee ee eee See 
0 5 6 10 11 15 16 31 


if rA=0 then b —0 

else b & (rA) 

EA — b+EXTS(d) 

frD — DOUBLE(MEM EA, 4)) 


EA is the sum (rAlQ) + d. 

The word in memory addressed by EA is interpreted as a floating-point single-precision 
operand. This word is converted to floating-point double-precision (see Section 3.6.9.1, 
“Double-Precision Conversion for Floating-Point Load Instructions”) and placed into frD. 
Other registers altered: 


« None 
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lfsu Ifsu 


Load Floating-Point Single-Precision with Update Integer Unit and 
Floating-Point Unit 
Ifsu frD,d(rA) 


ee ee ee ee 
0 


5 6 10 11 15 16 31 


if rA=0 then b — 0 

else b <(rA) 

EA © b+EXTS(d) 

frD — DOUBLE(MEMCEA, 4)) 
rA<EA 


EA is the sum (rAlQ) + d. 
The word in memory addressed by EA is interpreted as a floating-point single-precision 


operand. This word is converted to floating-point double-precision (see Section 3.6.9.1, 
“Double-Precision Conversion for Floating-Point Load Instructions”) and placed into frD. 


EA 1s placed into rA. 


If operand rA=0 the MPC601 does not update register r0. The PowerPC architecture 
defines load with update instructions with operand rA=0 as an invalid form. 
Other registers altered: 


¢ None 
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lfsux lfsux 


Load Floating-Point Single-Precision with Update Indexed Integer Unit and 
Floating-Point Unit 
Ifsux frD,rA,rB 


Reserved 





0 5 6 10 11 15 16 20 21 30 31 


if rA=0 then b — 0 

else b < (rA) 

EA © b+(rB) 

frD + DOUBLE(MEMC(EA, 4)) 
rA< EA 


EA is the sum (rAlQ) + (rB). 
The word in memory addressed by EA is interpreted as a floating-point single-precision 


operand. This word is converted to floating-point double-precision (see Section 3.6.9.1, 
‘“‘Double-Precision Conversion for Floating-Point Load Instructions”) and placed into frD. 


EA is placed into rA. 


If operand rA=0 the MPC601 does not update register r0. The PowerPC architecture 
defines load with update instructions with operand rA=0 as an invalid form. 


Other registers altered: 


¢ None 
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Ifsx lfsx 


Load Floating-Point Single-Precision Indexed Integer Unit and 
Floating-Point Unit 


Ifsx frD,rA,rB 


Reserved 





if rA=0 then b — 0 

else b ¢ (rA) 

EA < b+(rB) 

frD — DOUBLE(MEMCEA, 4)) 


EA is the sum (rAlQ) + (rB). 

The word in memory addressed by EA is interpreted as a floating-point single-precision 
operand. This word is converted to floating-point double-precision (see Section 3.6.9.1, 
“Double-Precision Conversion for Floating-Point Load Instructions”) and placed into frD. 
Other registers altered: 


* None 
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lha lha 


Load Half Word Algebraic Integer Unit 
lha rD,d(rA) 
(aera i: ee ae, Se eee ene 
0 5 6 10 11 15 16 31 
if rA=0 then b — 0 
else b € (rA) 
EA <— b+EXTS(d) 
rD + EXTS(MEM(EA, 2)) 


EA is the sum (rAl0) + d. The half word in memory addressed by EA is loaded into 
rD[16-31]. Bits rD[O—15] are filled with a copy of bit 0 of the loaded half word. 


Other registers altered: 


« None 
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Ihau lhau 
Load Half Word Algebraic with Update Integer Unit 


Ihau rD,d(rA) 


CE ee a ae ae | ee, eee 
0 


5 6 10 11 15 16 31 


if rA=0 then b <0 

else b + (rA) 

EA <— b+EXTS(d) 

rD — EXTS(MEMCEA, 2)) 
rA< EA 


EA is the sum (rAlQ) + d. The half word in memory addressed by EA is loaded into rD[16— 
31]. 

Bits rD[Q—15] are filled with a copy of bit 0 of the loaded half word. 

EA is placed into rA. 


If operand rA=(0 the MPC601 does not update register r0, or if rA=rD the load data is 
loaded into register rD and the register update is suppressed. The PowerPC architecture 
defines load with update instructions with operand rA=0 or rA=rD as invalid forms 


Other registers altered: 


¢ None 
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Ihaux ihaux 


Load Half Word Algebraic with Update Indexed Integer Unit 


Ihaux rD,rA,rB 


Reserved 





0 5 6 10 11 15 16 20 21 30 31 


if rA=0 then b — 0 

else b & (rA) 

EA < b+(rB) 

rD — EXTS(MEM(EA, 2)) 

rA< EA 
EA is the sum (rAl0) + (rB). The half word in memory addressed by EA is loaded into 
rD[16-31]. Bits rD[Q—15] are filled with a copy of bit 0 of the loaded half word. 


EA is placed into rA. 


If operand rA=( the MPC601 does not update register r0, or if rA=rD the load data is 
loaded into register rD and the register update is suppressed. The PowerPC architecture 
defines load with update instructions with operand rA=0 or rA=rD as invalid forms 


Other registers altered: 


* None 
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[hax Ihax 


Load Half Word Algebraic Indexed Integer Unit 
Ihax rD,rA,rB 


Reserved 





0 5 6 10 11 15 16 20 21 30 31 


if rA=0 then b — 0 

else b < (rA) 

EA — b+(rB) 

rD — EXTS(MEMCEA, 2)) 


EA is the sum (rAl0) + (rB). The half word in memory addressed by EA is loaded into 


_rD[16-31]. Bits rD[O—15] are filled with a copy of bit 0 of the loaded half word. 


Other registers altered: 


« None 
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Inbrx Ihbrx 


Load Half Word Byte-Reverse Indexed Integer Unit 


Ihbrx rD,rA,rB 


Reserved 





0 5 6 10 11 15 16 20 21 30 31 


if rA=0 then b — 0 

else b — (rA) 

EA — b+(rB) 

rD — (16)0 ! MEM(EA+1, 1) ll MEM(BA, 1) 
EA is the sum (rAl0) + (rB). Bits 0-7 of the half word in memory addressed by EA are 
loaded into rD[24~-31]. Bits 8-15 of the half word in memory addressed by EA are loaded 
into rD[16—23]. Bits rD[Q—15] are cleared to 0. 


The PowerPC architecture cautions programmers that some implementations of the 
architecture may run the Ihbrx instructions with greater latency than other types of load 
instructions. This is not the case in the MPC601. This instruction operates with the same 
latency as other load instructions. 


Other registers altered: 


« None 
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Ihz Ihz 


Load Half Word and Zero Integer Unit 
thz rD,d(rA) 

ee at ee rc ae et 
0 5 6 10 11 15 16 31 


if rA=0 then b<-0 

else b<erA 

EA < b+EXTS(d) 

rD — 0(16)0 Il MEM(EA, 2) 


EA is the sum (rAlQ) + d. The half word in memory addressed by EA is loaded into rD[16— 
31]. Bits rD[0-15] are cleared to (). 


Other registers altered: 


« None 
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Ihzu Ihzu 


Load Half Word and Zero with Update Integer Unit 
Ihzu rD,d(rA) 


ae a ee, Senne, 
0 


5 6 10 11 15 16 31 


if rA=0 then b — 0 

else b — (rA) 

EA <— b+EXTS(d) 

rD <— (16)0 ll MEM(EA, 2) 

rA — EA 
EA is the sum (rAIQ) + d. The half word in memory addressed by EA is loaded into rD[16— 
31]. Bits rD[O0—15] are cleared to 0. 


EA is placed into rA. 


If operand rA=0 the MPC601 does not update register r0, or if rA=rD the load data is 
loaded into register rD and the register update is suppressed. The PowerPC architecture 
defines load with update instructions with operand rA=() or rA=rD as invalid forms 


Other registers altered: 


« None 





MOTOROLA Chapter 10. Instruction Set 10-97 


Ihzux Ihzux 
Load Half Word and Zero with Update Indexed , Integer Unit 


Ihzux rD,rA,rB 


Reserved - 





0 5 6 10 11 15 16 20 21 30 31 


if rA=0 then b <— 0 

else b € (rA) 

EA © b+(rB) 

rD<(16)0 Il MEM(EA, 2) 

rA¢-EA 
EA is the sum (rAlQ) + (rB). The half word in memory addressed by EA is loaded into 
rD[16—31]. Bits rD[Q-15] are cleared to 0). 


EA is placed into rA. 


If operand rA=0 the MPC601 does not update register r0, or if rA=rD the load data is 
loaded into register rD and the register update is suppressed. The PowerPC architecture 
defines load with update instructions with operand rA=0 or rA=rD as invalid forms 


Other registers altered: 


« None 
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Inzx Ihzx 


Load Half Word and Zero Indexed Integer Unit 


Ihzx rD,rA,rB 


Reserved 





0 5 6 10 11 15 16 20 21 30 31 


if rA=0 then b—-0 

else be-rA 
EA€—b+rB 

rD€(16)0 Il MEM(EA, 2) 


The effective address is the sum (rAlQ) + (rB). The half word in memory addressed by EA 
is loaded into rD[16-31]. Bits rD[Q—15] are cleared to 0. 
Other registers altered: 


¢ None 
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Imw Imw 
Load Multiple Word Integer Unit 


Imw rD,d(rA) 
[POWER mnemonic: Im] 


eae ee ee ae ee ees 
5 


0 6 10 11 15 16 31 


if rA=0 then b<—0 


else be-rA 
EA€—-b+EXTS(d) 
ré—rD 


do while r < 31 
GPR(r)<—- MEM(EA, 4) 
re—r+l 
EA<-EA+4 


EA is the sum (rAlQ) + d. 
n=(32-D). 


nconsecutive words starting at EA are loaded into the 32 bits of GPRs rD through r31. EA 
must be a multiple of 4; otherwise, the system alignment exception handler is invoked if 
the load crosses a page boundary. 


If rA is in the range of registers specified to be loaded, it will be skipped in the load process. 
If operand rA=0, the register is not considered as used for addressing, and will be loaded. 


Other registers altered: 


e None 
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iscbxx POWER Architecture Instruction iscbxx 


Load String and Compare Byte Indexed Integer Unit 
Iscbx rD,rA,rB (Rc=0) 

Iscbx. rD,rA,rB (Rc=() 

0 5 6 10 11 15 16 20 21 30 31 


This instruction is not part of the PowerPC architecture. 
EA is the sum (rAlQ) + (FB). XER[25-31] contains the byte count. Register rD is the 
starting register. 


n=XER[25-31], which is the number of bytes to be loaded. nr=(n/4), which is the number 
of registers to receive data. 


Starting with the leftmost byte in rD, consecutive bytes in memory addressed by the EA 
are loaded into rD through rD+nr-1, wrapping around back through GPR (0) if required, 
until either a byte match is found with XER[16—23] or n bytes have been loaded. If a byte 
match is found, that byte is also loaded. 


Bytes are always loaded left to right in the register. In the case when a match was found 
before bytes were loaded, the contents of the rightmost byte(s) not loaded of that register 
and the contents of all succeeding registers up to and including rD+nr-1 are undefined. 
Also, no reference is made to memory after the matched byte is found. In the case when a 
match was not found, the contents of the rightmost byte(s) not loaded of rD+nr-1 is 
undefined. 


When XER[25-—31 ]=0, the content of rD is undefined. 


The count of the number of bytes loaded up to and including the matched byte, if a match 
was found, is placed in XER[25—31]. 


If rA and rB are in the range of registers specified to be loaded, it will be skipped in the 
load process. If operand rA=0, the register is not considered as used for addressing, and 
will be loaded. 


Other registers affected: 
¢ Condition Register (CRO Field): 
Affected: LT, GT, EQ, SO Gf Rc=1) 
¢ XER: 
Affected: XER[25-3 1 ]=# of bytes loaded 
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Note: If Rc=1 and XER[25-~31]=0 then the CRO field is undefined. If Rc=1 and 
XER[25—31]#0 then the CRO field is set as follows: 


LT, GT, EQ, SO =b’00” Il match Il XER(SO) 


10-102 PowerPC 601 RISC Microprocessor User's Manual MOTOROLA 


Iswi ISwi 


Load String Word Immediate Integer Unit 


Iswi rD,rA,NB 
[POWER mnemonic: Isi] 


Reserved 





0 5 6 10 11 15 16 20 21 30 31 


if rA=0 then EA<—O 


else EAerA 
if NB=0 then n¢-32 
else n<NB 
r—rD - | 

i<—32 


do while n20 
if i=32 then 
ré—r+l (mod 32) 
GPR(r)<-0 
GPR(n[i-i+7]} --MEM(EA, 1) 
ii+8 
The EA is (rA10). 


Let n=NB if NB4Q, n=32 if NB=0; 1 is the number of bytes to load. Let nr=CEIL(n/4); nr 
is the number of registers to be loaded with data. 


n consecutive bytes starting at the EA are loaded into GPRs rD through rD+nr-1. Bytes are 
loaded left to right in each register. The sequence of registers wraps around to rO if required. 
If the four bytes of register rD+nr-1 are only partially filled, the unfilled low-order byte(s) 
of that register are cleared to 0). 


If rA is in the range of registers specified to be loaded, it will be skipped in the load process. 
If operand rA=0, the register is not considered as used for addressing, and will be loaded. 


Other registers altered: 


e None 
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Iswx | Iswx 
Load String Word Indexed Integer Unit 


Iswx rD,rA,rB 
[POWER mnemonic: Isx] 


Reserved 





if rA=0 then b-0 
else bé-rA 
EA¢€-b+rB 
nXER[25-31]} 
re—-rD - | 
i~32 
do while n> 0 
if i=32 then 
ré—r+1 (mod 32) 
GPR(r)<—0 
GPR(r)[i-i+7]}<-MEM(EA, 1) 
i<-i+8 


EA is the sum (rAl0)+(rB). Let n=XER[25-31]; n is the number of bytes to load. Let 
nr=CEIL(n/4): nr is the number of registers to receive data. 


If n>O, n consecutive bytes starting at EA are loaded into GPRs rD through rD+nr-1. 


Bytes are loaded left to right in each register. The sequence of registers wraps around 
through r@ if required. If the bytes of rD+nr-1 are only partially filled, the unfilled low- 
order byte(s) of that register are cleared to (). 


If n=(, the content of rD is undefined. 


If rA and rB are in the range of registers specified to be loaded, it will be skipped in the 
load process. If operand rA=(), the register is not considered as used for addressing, and 
will be loaded. 


Other registers altered: 


¢ None 
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lwarx lwarx 


Load Word and Reserve Indexed Integer Unit 


Iwarx rD,rA,rB 


Reserved 





if rA=(0 then b-0 

else berA 
EA<-b+rB 

RESERVE<—- 1 
RESERVE_ADDR€func(EA) 
rD¢-MEMC(EA,4) 


EA is the sum (rAlQ) + (rB). The word in memory addressed by EA is loaded into rD. 

This instruction creates a reservation for use by a store word conditional instruction. An 
address computed from the EA is associated with the reservation, and replaces any address 
previously associated with the reservation: the manner in which the address to be associated 


with the reservation is computed from the EA is described in Section 3.1.1, “Effective 
Address Calculation”. 


The EA must be a multiple of 4. If it is not, the alignment exception handler will be invoked 
if the load crosses a page boundary, or the results will be boundedly undefined. 


Other registers altered: 


« None 
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Iwbrx Iwbrx 


Load Word Byte-Reverse Indexed Integer Unit 


Iwbrx rD,rA,rB 
[POWER mnemonic: Ibrx] 


Reserved 





0 5-6 10 11 15 16 20 21 30 31 


if rA=0 then b¢—-O 

else berA 

EA¢—b+rB 

rD<MEM(EA+43, 1) II MEM(EA+42, 1) 

| MEM(EA+1, 1) ll MEM(BA, 1) 

EA is the sum (rAI0)+(rB). Bits 0-7 of the word in memory addressed by EA are loaded 
into rD[24-31]. Bits 8—15 of the word in memory addressed by EA are loaded into 
rD[16—23]. Bits 16-23 of the word in memory addressed by EA are loaded into rD[8—15]. 
Bits 24-31 of the word in memory addressed by EA are loaded into rD[0-7]. 


The PowerPC architecture cautions programmers that some implementations of the 
architecture may run the Iwbrx instructions with greater latency than other types of load 
instructions. This is not the case in the MPC601. This instruction operates with the same 
latency as other load instructions. 


Other registers altered: 


« None 
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Iwz Ilwz 


Load Word and Zero Integer Unit 


lwz rD,d(rA) 
[POWER mnemonic: 1] 


Li eet ct cle ea ae ee 
0 


5:6 10 11 15 16 31 


if rA=0 then bé—0 
else be-rA 
EAC-bt+EXTS(d) 
rD¢-MEMCEA, 4) 


EA is the sum (rAlQ) + d. The word in memory addressed by EA is loaded into rD. 


Other registers altered: 


e None 
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lwzu Iwzu 


Load Word and Zero with Update Integer Unit 


Iwzu rD,d(rA) 
[POWER mnemonic: lu] - 


ee ee ee ee ee | eee eee 
0 


5 6 10 11 15 16 31 


if rA=0 then b — 0 
else b = (rA) 
EA © b+EXTS(d) 
rD<-MEMCEA, 4) 
rA<-EA 


EA is the sum (rAlQ) + d. The word in memory addressed by EA is loaded into rD. 
EA is placed into rA. 


If operand rA=0 the MPC601 does not update register r0, or if rA=rD the load data is 
loaded into rD and the register update is suppressed. The PowerPC architecture defines 
load with update instructions with operand rA=0 or rA=rD as invalid forms. 

Other registers altered: 


« None 
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Ilwzux lwzux 


Load Word and Zero with Update Indexed Integer Unit 


lwzux rD,rA,rB 
[POWER mnemonic: lux] 


Reserved 





if rA=0 then b <— 0 
else b & (rA) 
EA < b+(rB) 
rD¢-MEMC(EA, 4) 
rA<EA 


EA is the sum (rAl0)+(rB). The word in memory addressed by EA is loaded into rD. 
EA is placed into rA. 


If operand rA=0 the MPC601 does not update register r0, or if rA=rD the load data is 
loaded into register rD and the register update is suppressed. The PowerPC architecture 
defines load with update instructions with operand rA=0 or rA=rD as invalid forms 


Other registers altered: 


¢ None 
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lwzx lwzx 


Load Word and Zero Indexed Integer Unit 


IWZx rD,rA,rB 


[POWER mnemonic: Ix] 


Reserved 





if rA=0 then bé—0 
else bé-rA 
EAb+rB 
rD¢-MEMC(EA, 4) 


EA is the sum (rAl0) + (rB). The word in memory addressed by EA is loaded into rD. 


Other registers altered: 


« None 
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mas kgx POWER Architecture Instruction mas kgx 


Mask Generate Integer Unit 

maskg rA,rS,rB (Rc=()) 

maskg. rA,rS,rB (Rc=1) 

ee 
10 11 15 16 20 21 30 31 


This instruction is not part of the PowerPC architecture. 
Let mstart=rS[27-—31], specifying the starting point of a mask of ones. Let 
mstop=rB[27-3 1], specifying the end point of the mask of ones. 
If mstart < mstop+1 then 
MASK (mstart..mstop)=ones 
MASK (all other bits)=zeros 
If mstart=mstop =1 then 
MASK(0-31)=ones 
If mstart>mstop+1 then 
MASK (mstop+1..mstart-1)=zeros 
MASK (all other bits)=ones 
MASK is then placed in rA. 


Other registers altered: 
¢ Condition Register (CRO Field): 
Affected: LT, GT, EQ, SO (if Rc=1) 
Note: This instruction is specific to the MPC6(0)1. 
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maskirx power Architecture Instruction maskir« 


Mask Insert from Register Integer Unit 
maskir rA,rS,rB (Rc=() 
maskir. rA,rS,rB (Rc=1) 
aaa he ed = 
0 5 6 10 11 15 16 20 21 30 31 


This instruction is not part of the PowerPC architecture. 
Register rS is inserted into rA under control of the mask in rB. 
Other registers altered: 
¢ Condition Register (CRO Field): 
Affected: LT, GT, EQ, SO (if Rc=1) 
Note: This instruction is specific to the MPC601. 
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merf merf 


Move Condition Register Field Integer Unit 


merf crfD,crfS 


Reserved 





0 5 6 8 9 10 11 13 14 15 16 20 21 30 31 


CR[4*erfD:4*crfD+3] <— CR[4*crfS:4*crfS+3] 
The contents of condition register field crfS are copied into condition register field crfD. 
All other condition register fields remain unchanged. 


Note that if the link bit (bit 31) is set for this instruction, the PowerPC architecture 
considers the instruction to be of an invalid form. Relative to the MPC6()1, this instruction 
executes and the link register is left in an undefined state. 


Note: Use of invalid instruction forms is not recommended. This description is provided 
for informational purposes only. 
Other registers altered: 
¢ Condition Register (CR field specified by operand erfD): 
Affected: LT, GT, EQ, SO 





MOTOROLA Chapter 10. Instruction Set 10-113 


merfs merfs 


Move to Condition Register from FPSCR Floating-Point Unit 


merfs crfD,crfS 


Reserved 





0 5 6 8 9 10 11 1314 15 16 20 21 30 31 


The contents of FPSCR field crfS are copied to CR Field erfD. All other CR fields are 
unchanged. All exception bits copied are reset to zero in the FPSCR. 


Other registers altered: 


* Condition Register (CR Field specified by operand crfS): 


Affected: FX, OX (if erfS=(0) 
Affected: UX, ZX, XX, VXSNAN (if erfS=1) 
Affected: VXISI, VXIDI, VXZDZ, VXIMZ | (if erfS=2) 
Affected: VX VC (if crfS=3) 
Affected: VXSOFT, VXSQRT, VXCVI (if crfS=5) 
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mcrxr mcrxr 


Move to Condition Register from XER Integer Unit 


merxr crfD 


Reserved 





0 5 6 8 9 10 11 15 16 20 21 30 31 


CR[4*crfD+3 ]<—XER[0-3] 
XER[0-3]<— b’0000” 


The contents of XER[0Q-3] are copied into the condition register field designated by erfD. 
All other fields of the condition register remain unchanged. XER[0—3] is cleared to zero. 
Other registers altered: 
¢ Condition Register (CR Field specified by crfD operand): 
Affected: LT, GT, EQ, SO 
¢ XER[0-3]: 
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THIS PAGE 
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‘LEFT BLANK. 
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mfcr mfcr 





Move from Condition Register Integer Unit 
mfcr rD 
Reserved 
0 56 10 11 15 16 20 21 30 31 
rD< CR 


The contents of the condition register are placed into rD. 


Other registers altered: 


e None 
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mffsx mffsx 


Move from FPSCR Integer Unit 
mffs frD (Rc=0) 
mffs. frD (Rc=1) 

Reserved 





0 56 10 11 15 16 20 21 30 31 


The contents of the FPSCR are placed into bits 32-63 of register frD. Bits 0-31 of register 
frD are undefined. 
Other registers altered: 
¢ Condition Register (CR1 Field): 
Affected: LT, GT, EQ, SO (if Rc=1) 


POWER Compatibility Note: The PowerPC architecture defines bits 0-31 of floating- 
point register frD as undefined. In the MPC601, these bits take on the value 
x ‘FFF8_ 0000’. 
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mfmsr 


Move from Machine State Register 


mfmsr rD 





0 56 10 11 15 16 20 21 


rD< MSR 
The contents of the MSR are placed into rD. 


This is a supervisor-level instruction. 
Other registers altered: 


¢ None 
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mfmsr 


Integer Unit 


Reserved 


30 31 





10-119 





mfspr | _ mfspr 


Move from Special Purpose Register Integer Unit 


mfspr rD,SPR 


Reserved 





0 56 10 11 20 21 30 31 


nrD[5-9] ll rD[(0-4] 
rD< SPR(n) 


The SPR field denotes a special purpose register, encoded as shown in Table 10-4. The 
contents of the designated special purpose register are placed into rD. 


The value of SPR[O] is 1 if and only if reading the register is at the supervisor-level. 
Execution of this instruction specifying a supervisor-level register when MSR[PR]=1 will 
result in a supervisor-level instruction type program exception. 


If the SPR field contains a value that is not valid for the MPC6()1, the instruction form is 
invalid. For an invalid instruction form in which SPR[O]=1, if MSR[PR]=1 a supervisor- 
level instruction type program exception will occur instead of an no-op. 


Other registers altered: 


¢ None 


Table 10-4. SPR Encodings for mfspr 


Register 


00770 


XER 
DEC 
LR 
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Table 10-4. SPR Encodings for mfspr(Continued) 


SPRI[5-9] | SPR[O-4] Name 
00000 11001 SDR1 Supervisor 


25 

26 00000 11010 Supervisor 
27 11011 Supervisor 
272 10000 Supervisor 
273 10001 Supervisor 


1008 11114 10000 Checkstop Supervisor 
Register 
(HIDO) 

1009 11111 10001 Debug Mode Supervisor 
Register 
(HID1) 


1010 10010 IABR (HID2) | Supervisor 
1013 11111 10101 DABR (HID5) Supervisor 
1023 11111 11414 PIR(HID15) | Supervisor 


*Note that the order of the two 5-bit halves of the SPR number is reversed compared with actual 
instruction coding. 





If the SPR field contains any value other than one of these implementation-specific values or one 
of the values shown in Table 3-40, the instruction form is invalid. 


spr[0]=1 if and only if writing the register is supervisor-level. Execution of this instruction specifying 
a defined and supervisor-level register when MSR[PR]=1 results in a privilege violation type pro- 
gram exception. 





MOTOROLA Chapter 10. Instruction Set 10-121 


For mtspr and mfspr instructions, the SPR number coded in assembly language does not appear 
directly as a 10-bit binary number in the instruction. The number coded is split into two 5-bit halves 
that are reversed in the instruction, with the high-order 5 bits appearing in bits 16-20 of the instruc- 
tion and the low-order 5 bits in bits 11-15. 


SPR encodings for the DEC, MQ, RTCL and RTCU registers are not part of the PowerPC architec- 
ture. 
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mfsr mfsr 





Move from Segment Register Integer Unit 
mfsr rD,SR 
Reserved 
0 56 10 1112 15 16 20 21 30 31 
rD<-SEGREG(SR) 


The contents of segment register SR is placed into rD. 
This is a supervisor-level instruction. 


This instruction is defined only for 32-bit implementations; using it on a 64-bit 
implementation causes an illegal instruction type program exception. 


Other registers altered: 


¢ None 
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mfsrin mfsrin 


Move from Segment Register Indirect Integer Unit 


mfsrin rD,rB 


Reserved 





0 56 10 11 15 16 20 21 30 31 


rD<-SEGREG(rB(0-3]) 
The contents of the segment register selected by bits (0-3 of rB are copied into rD. 


This is a supervisor-level instruction. 


This instruction is defined only for 32-bit implementations. Using it on a 64-bit 
implementation causes an illegal instruction exception. 


Other registers altered: 


« None 
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micrf micrf 


Move to Condition Register Fields Integer Unit 


mterf CRM,rS 


Reserved 





0 5 6 10 11 12 19 20 21 


mask€-(4)(CRM{[O}) ll (4)(CRM{[1]) ll... (4)(CRMI7]) 
CRE(r$[32-63] & mask) | (CR & —mask) 


The contents of rS are placed into the condition register under control of the field mask 
specified by CRM. The field mask identifies the 4-bit fields affected. Let i be an integer in 
the range 0-7. If CRM(i) = 1, CR Field i (CR bits 4*i through 4*i+3) is set to the contents 
of the corresponding field of the of rS. | 


Other registers altered: 
CR fields selected by mask 
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mtfsb0x mitfsb0x 


Move to FPSCR Bit 0 Integer Unit 
mtfsb0 crbD (Rc=0) 
mtfsb0. crbD (Rc=1) 

Reserved 





0 56 10 11 15 16 20 21 30 31 


Bit crbD of the FPSCR is cleared to zero. All other bits of the FPSCR are unchanged. 


Other registers altered: 
¢ Condition Register (CR1 Field): 
Affected: LT, GT, EQ, SO (if Rc=1) 
¢ Floating-point Status and Control Register: 
Affected: FPSCR bit crbD 
Note: Bits 1 and 2 (FEX and VX) cannot be explicitly reset. 
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mtfsb1 x mtfsb1 x 


Move to FPSCR Bit 1 Integer Unit 
mtfsbl crbD (Rc=(0) 
mtfsbl. crbD (Rc=1) 

Reserved 





0 56 10 11 15 16 20 21 30 31 


Bit crbD of the FPSCR is set to one. All other bits of the FPSCR are unchanged. 


Other registers altered: 
* Condition Register (CR1 Field): 


Affected: LT, GT, EQ, SO (if Rc=1) 
¢ Floating-point Status and Control Register: 
FPSCR bit erbD 


Note: Bits 1 and 2 (FEX and VX) cannot be explicitly reset. 
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mtfsfx mtfstx 


Move to FPSCR Fields Integer Unit 
mtfsf FM, frB (Rc=0) 
mtfsf. FM, frB (Rc=1) 

Reserved 





0 14 15 16 20 21 30 31 


Bits 32-63 of register frB are placed into the FPSCR under control of the field mask 
specified by FM. The field mask identifies the 4-bit fields affected. Let i be an integer in the 
range ()-7. If FM(i) = 1, FPSCR Field i (FPSCR bits 4*i through 4*1+3) is set to the contents 
of the corresponding field of the low-order 32 bits of register frB. 


The other PowerPC implementations, the move to FPSCR fields (mtfsf) instruction may 
perform more slowly when only a portion of the fields are updated. 
Other registers altered: 
* Condition Register (CR1 Field): 
Affected: LT, GT, EQ, SO (if Rc=1) 
¢ Floating-point Status and Control Register: 
FPSCR fields selected by mask 
Updating fewer than all eight fields of the FPSCR may have substantially poorer 
performance on some implementations than updating all the fields. 


When FPSCR[(-3] is specified, bits 0 (FX) and 3 (OX) are set to the values of frB[32] and 
frB[35] (.e., even if this instruction causes OX to change from 0 to 1, FX is set from 
frB[32] and not by the usual rule that FX is set to 1 when an exception bit changes from () 
to 1). Bits 1 and 2 (FEX and VX) are set according to the usual rule and not from 
frB[33-34]. 
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mtfsfix mtfsfix 


Move to FPSCR Field Immediate Integer Unit 
mtfsfi crfD,IMM (Rce=0) 
mtfsfi. crfD,IMM (Rc=1) 

Reserved 






0 5 6 8 9 10 1112 15 16 19 20 21 30 31 





The value of the IMM field is placed into FPSCR field crfD. All other FPSCR fields are 
unchanged. 


Other registers altered: 
* Condition Register (CR1 Field): 


Affected: LT, GT, EQ, SO (if Rc=1) 
¢ Floating-point Status and Control Register: 
FPSCR field erfD 


When FPSCR[0-3] is specified, bits 0 (FX) and 3 (OX) are set to the values of IMM[0] and 
IMM{3] (i.e., even if this instruction causes OX to change from 0 to 1, FX is set from 
IMM[0] and not by the usual rule that FX is set to 1 when an exception bit changes from 0) 
to 1). Bits 1 and 2 (FEX and VX) are set according to the usual rule, given in Section 2.2.3, 
“Floating-Point Status and Control Register (FPSCR)” and not from IMM[1—2]. 
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mitmsr | mtmsr 





Move to Machine State Register Integer Unit 
mtmsr rs 
Reserved 
0 56 10 11 15 16 20 21 30 31 
MSR¢-rS[(0-31] 


The contents of rS are placed into the MSR. 


This is a supervisor-level instruction and context synchronizing. See Section 3.1.2, 
“Context Synchronization” for the definition of context synchronization. 


Other registers altered: 
MSR 
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mispr mispr 


Move to Special Purpose Register Integer Unit 


mtspr SPR,rS 


Reserved 





n= rD[5-9] i! rD[0O-4] 
SPREG(n)<€—rS[0-3 1] 


The SPR field denotes a special purpose register, encoded as shown in Table 10-4. The 
contents of rS are placed into the designated special purpose register. 


The value of SPR[O] is 1 if and only if writing the register is a supervisor-level operation. 
Execution of this instruction specifying a defined and supervisor-level register when 
MSR[PR]=1 results in a supervisor-level instruction exception. 


If the rS field contains an invalid value, the instruction form is invalid. For an invalid 
instruction form in which SPR[O]=1, if MSR[PR]=1 a supervisor-level instruction 
exception will occur instead of a no-op. 


Other registers altered: 


None 
Table 10-4 lists the SPR encodings for the MPC601. 


Table 10-5. SPR Encodings for mtspr 


Register 
SPR[5~9] | SPR[O-4] Heme 












@ 


00000 
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10-132 


Table 10-5. SPR Encodings for mtspr(Continued) 


Register 
Decimal | SPR[5-9] | SPR[0—4] ee 


10011 Supervisor 
10110 Supervisor 
11001 Supervisor 
11010 Supervisor 
11011 Supervisor 
10000 Supervisor 
10001 Supervisor 
10010 Supervisor 
10011 Supervisor 
2 11010 EAR Supervisor 


1008 11111 10000 Checkstop Supervisor 
Register 
(HIDO) 

1009 11111 10001 Debug Mode 
Register 
(HID1) 


1010 11111 10010 IABR (HID2) Supervisor 


1013 11111 10101 DABR (HIDS5) Supervisor 
1023 11111 11111 PIR (HID15) Supervisor 










=? 
ie) 


22 
25 
26 
27 
272 
273 
274 
275 
82 









Supervisor 
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mtsr misr 





Move to Segment Register Integer Unit 
mtsr SR,rS 
Reserved 
0 56 10 11 12 15 16 20 21 30 31 
SEGREG(SR)<-(rS) 


The contents of rS is placed into SR. 
This is a supervisor-level instruction. 


This instruction is defined only for 32-bit implementations. Using it on a 64-bit 
implementation causes an illegal instruction type program exception. 


Other registers altered: 


« None 





MOTOROLA Chapter 10. Instruction Set 10-133 


mtsrin mtsrin 


Move to Segment Register Indirect Integer Unit 


mtsrin rS,rB 
(POWER mnemonic: mtsri] 


Reserved 





0 5 6 10 11 15 16 20 21 30 31 


SEGREG(rB[0-3})<-(rS) 
The contents of rS are copied to the segment register selected by bits 0-3 of rB. 


This is a supervisor-level instruction. 


This instruction is defined only for 32-bit implementations. Using it on a 64-bit 
implementation causes an illegal instruction exception. 


Other registers altered: 


¢ None 





10-134 PowerPC 601 RISC Microprocessor User's Manual MOTOROLA 


mulx POWER Architecture Instruction mulx 


Multiply Integer Unit 

mul rD,rA,rB (OE=0 Rc=0) 

mul. rD,rA,rB (OE=0 Rc=1) 

mulo rD,rA,rB (OE=1 Rc=0) 

mulo. rD,rA,rB (OE=1 Rc=1) 

a Ea De De 
10 11 15 16 20 21 22 30 31 


This instruction is not part of the PowerPC architecture. 


Bits 0-31 of the product (rA)*(rB) are placed into rD. Bits 32-63 of the product (rA)*(rB) 
are placed into the MQ register. 


If Rc=1, then LT,GT and EQ reflect the result in the MQ register (the low order 32 bits). If 
OE=1 then SO and OV are set to one if the product cannot be represented in 32 bits. 


If the smaller absolute value of the two multipliers is placed in rB, the instruction may 
complete execution more quickly. See 7.3.2.1, “Integer Instructions Timing Examples” for 
additional information about instruction performance. 

Other registers altered: 


* Condition Register (CRO Field): 


Affected: LT, GT, EQ, SO (if Rc=1) 
¢ XER: 
Affected: SO, OV (if OE=1) 


Note: This instruction is specific to the MPC601. 
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mulhwx mulhwx 


Multiply High Word Integer Unit 
mulhw rD,rA,rB (Rce=0) 
mulhw. rD,rA,rB (Rc=1) 

Reserved 





0 5 6 10 11 15 16 20 21 22 30 31 


prod[0-63]<—rA[32-63 ]*rB[32-63] 
rD[32-63]<—prod[0-3 1] 
rD[0-3 1]<—undefined 


The contents of rA and of rB are interpreted as 32-bit signed integers. They are multiplied 
to form a 64-bit signed integer product. The high-order 32 bits of the 64-bit product are 
placed into rD. 


If the smaller absolute value of the two multipliers is placed in rB, the instruction may 
complete execution more quickly. See 7.3.2.1, “Integer Instructions Timing Examples” for 
additional information about instruction performance. 
Other registers altered: 
* Condition Register (CRO Field): 
Affected: LT, GT, EQ, SO (if Rc=1) 
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mulhwux mulhwu. 


Multiply High Word Unsigned Integer Unit 
mulhwu rD,rA,rB (Rc=0) 
mulhwu. rD,rA,rB (Rc=1) 

Reserved 





0 5 6 10 11 15 16 20 21 22 30 31 


prod{0-63]<-r A[32-63 ]*rB[32-63] 
rD[32-63 }prod[0-3 1] 
rD[0-3 1 ]}<-undefined 


The contents of rA and of rB are extracted and interpreted as 32-bit unsigned integers. 
They are multiplied to form a 64-bit unsigned integer product. The high-order 32 bits of the 
64-bit product are placed into rD, 


If the smaller absolute value of the two multipliers is placed in rB, the instruction may 
complete execution more quickly. See 7.3.2.1, “Integer Instructions Timing Examples” for 
additional information about instruction performance. 


This instruction causes the contents of the MQ to become undefined. 


Other registers altered: 
¢ Condition Register (CRO Field): 
Affected: LT, GT, EQ, SO (if Rc=1) 
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mullwx mullwx 


Multiply Low Integer Unit 
mullw rD,rA,rB (OE=0 Rce=0) 
mullw. rD,rA,rB (OE=0 Rc=1) 
mullwo rD,rA,rB (OE=1 Rc=0) 
mull wo. rD,rA,rB (OE=1 Rc=1) 


[POWER mnemonics: muls, muls., mulso, mulso.] 


Fe ea ee ee ee 


10 11 15 16 20 21 22 30 31 


rD<rA[32-63]*rB[32-63] 


The low-order 32 bits of the 64-bit product (rA)*(rB) are placed into rD. The low-order 
bits of the 32-bit product are independent of whether the operands are treated as signed or 
unsigned integers. However, OV is set based on the result interpreted as a signed integer. 


If the smaller absolute value of the two multipliers is placed in rB, the instruction may 
complete execution more quickly. See 7.3.2.1, “Integer Instructions Timing Examples” for 
additional information about instruction performance. 


If OE=1, then OV is set to one if the product cannot be represented in 32 bits. 


Other registers altered: 
¢ Condition Register (CRO Field): 


Affected: LT, GT, EQ, SO (if Rc=1) 
¢ XER: 
Affected: SO, OV (if OE=1) 
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mulli mulll 


Multiply Low Immediate Integer Unit 


mulli rD,rA,SIMM 
[POWER mnemonic: muli] 


i ee a Se ee ee eS aes 
5 


0 6 10 11 15 16 31 


prod[0-48]<-rA*SIMM 
rD<prod| 16-48] 


The low-order 32 bits of the 48-bit product (rA)*SIMM are placed into rD. The low-order 
bits of the 32-bit product are independent of whether the operands are treated as signed or 
unsigned integers. 

Other registers altered: 


« None 
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nabs«x POWER Architecture Instruction nabs 


Negative Absolute Integer Unit 
nabs rD,rA (OE=0 Rc=(0) 
nabs. rD,rA (OE=0 Rc=1) 
nabso rD,rA (OE=1 Rc=0) 
nabso. rD,rA (OE=1 Rc=1) 
Reserved 





This instruction is not part of the PowerPC architecture. 
The negative absolute value -I(rA)I is placed into rD. 


Other registers altered: 
¢ Condition Register (CRO Field): 


Affected: LT, GT, EQ, SO (if Rc=1) 
« XER: 
Affected: SO, OV (if OE=1) 


Note that nabs never overflows. If OE=1 then XER(OV) is cleared to zero and XER(SO) 
is not changed. 


Note: This instruction is specific to the MPC6()1. 
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nandx | nandx 


NAND Integer Unit 

nand rA,rs,rB (Rc=0) 

nand. rA,rS,rB (Rc=1) 

a re ee ae ee ee a76 Re| 

0 5 6 10 11 15 16 20 21 30 31 
rAq +(rS) & (rB)) 


The contents of rS are ANDed with the contents of rB and the one’s complement of the 
result is placed into rA. 
Other registers altered: 
¢ Condition Register (CRO Field): 
Affected: LT, GT, EQ, SO (if Rc=1) 
NAND with rA=rB can be used to obtain the one's complement. 
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negx negx 





Negate Integer Unit 
neg rD,rA (OE=0 Rc=0) 
neg. rD,rA (OE=0 Rc=1) 
nego rD,rA (OE=1 Rce=() 
nego. rD,rA (OE=1 Rc=1) 

Reserved 
0 5 6 10 11 15 16 20 21 22 30 31 

rD¢-{rA) + 1 


The sum -{rA) + 1 is placed into rD. 


If rA contains the most negative 32-bit number (x ‘8000_0000’), the low-order 32 bits of 
the result contain the most negative 32-bit number and, if OE=1, OV is set. 


Other registers altered: 
* Condition Register (CRO Field): 


Affected: LT, GT, EQ, SO (if Rc=1) 
¢ XER: 
Affected: SO OV (if OE=1) 
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norx norx 


NOR Integer Unit 

nor rA,rS,rB (Rc=0) 

nor. rA,rS,rB (Rc=1) 

0 5 6 10 11 15 16 20 21 30 31 
rA<¢ —((rS) ! (rB)) 


The contents of rS are ORed with the contents of rB and the one’s complement of the result 
is placed into rA. 
Other registers altered: 
¢ Condition Register (CRO Field): 
Affected: LT, GT, EQ, SO (if Rc=1) 
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Orx Oorx 


OR Integer Unit 
or rA,rS,rB (Re=0) 

or. rA,rS,rB (Rc=1) 

ee ee ae ee ee ee 44a Re 
0 5 6 10 11 15 16 20 21 30 31 


rA€—(rS) | (rB) 
The contents of rS is ORed with the contents of rB and the result is placed into rA. 
Other registers altered: 
¢ Condition Register (CRO Field): 
Affected: LT, GT, EQ, SO (if Rc=1) 
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orex Orcx 


OR with Complement Integer Unit 
orc rA,rS,rB (Rce=(0) 

ore. rA,rS,rB (Rc=1) 

0 5 6 10 11 15 16 20 21 30 31 


rA €(rS)!=(rB) 
The contents of rS is ORed with the complement of the contents of rB and the result is 
placed into rA. 
Other registers altered: 
¢ Condition Register (CRO Field): 
Affected: LT, GT, EQ, SO (if Rc=1) 
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orl orl 
OR Immediate Integer Unit 
ori rA,rS,;UIMM 

[POWER mnemonic: oril] 


0 


5 6 10 11 15 16 31 


rA€(rS) | ((16)0 | UIMM) 
The contents of rS is ORed with x’0000' || UIMM and the result is placed into rA. 


The preferred "no-op" (an instruction that does nothing) is: 
ori 0,0,0 


Other registers altered: 


« None 
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oris oris 


OR Immediate Shifted Integer Unit 
oris rA,rS,UIMM 


[POWER mnemonic: oriu] 


0 


5 6 10 11 15 16 31 
rA€(rS) | (UIMM II (16)0) 
The contents of rS is ORed with UIMM II x’0000’ and the result is placed into rA. 


Other registers altered: 


« None 
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rfi rfi 


Return from Interrupt Integer Unit 


Reserved 





0 5 6 10 11 15 16 20 21 30 31 


MSR[16-31}<—SRR1[16-31] 
NIA¢-SRRO[0-29] Il ObO0 


Bits 16-31 of SRR1 are placed into bits 16-31 of the MSR, then the next instruction is 
fetched, under control of the new MSR value, from the address SRRO[0—29] Il b‘OO*. 


This is a supervisor-level instruction and is context synchronizing. 


Other registers altered: 
MSR 
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rlmix POWER Architecture Instruction rimix 


Rotate Left then Mask Insert Integer Unit 
rimi rA,rS,rB,MB,ME — (Rc=0) 
rimi. rA,rS,rB,MB,ME (Rc=1) 

10 11 15 16 20 21 25 26 30 31 


This instruction is not part of the PowerPC architecture. 
The contents of rS is rotated left the number of positions specified by bits 27-31 of rB. The 
rotated data is inserted into rA under control of the generated mask. 
Other registers altered: 
* Condition Register (CRO Field): 
Affected: LT, GT, EQ, SO (if Rc=1) 
Note: This instruction is specific to the MPC60)1. 
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rlwimix rlwimix 


Rotate Left Word Immediate then Mask Insert Integer Unit 
rlwimi rA,rS,SH,MB,ME (Rc=0) 
rlwimi. * rA,rS,SH,MB,ME (Rc=1) 


[POWER mnemonics: rlimi, rlimi.] 


«| [= [x 
0 


5 6 10 11 15 16 20 21 25 26 30 31 


n¢-SH 

r—ROTL(S, n) 
m—MASK(MB, ME) 
rA@(r&M) | (rA &—m) 


The contents of rS are rotated left SH bits. A mask is generated having 1-bits from bit MB 
through bit ME and 0)-bits elsewhere. The rotated data is inserted into rA under control of 


the generated mask. 
Other registers altered: 
¢ Condition Register (CRO Field): 
Affected: LT, GT, EQ, SO (if Rc=1) 
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rlwinmx rilwinm x 


Rotate Left Word Immediate then AND with Mask Integer Unit 
rlwinm rA,rS,SH,MB,ME (Rce=0) 
rlwinm. rA,rS,SH,MB,ME (Rc=1) 


[POWER mnemonics: rlinm, rlinm. ] 


10 11 20 21 25 26 30 31 


n¢-SH 

ré—-ROTL(rS, rn) 
m€~MASK(MB, ME) 
rAér &m 


The contents of rS are rotated left SH bits. A mask is generated having 1-bits from bit MB 
through bit ME and ()-bits elsewhere. The rotated data is ANDed with the generated mask 
and the result is placed into rA. 


Other registers altered: 
* Condition Register (CRO Field): 
Affected: LT, GT, EQ, SO (if Rc=1) 


The opcode rlwinm can be used to extract an n-bit field, that starts at bit position b in rS[0- 
31], right-justified into rA (clearing the remaining 32-n bits of rA), by setting SH=b+n, 
MB=32-n, and ME=31. It can be used to extract an n-bit field, that starts at bit position 5 in 
rS[0-31], left-justified into rA (clearing the remaining 32-n bits of rA), by setting SH=b, 
MB=(), and ME=n-1. It can be used to rotate the contents of a register left (or right) by 
bits, by setting SH=n (32-n), MB=0, and ME=31. It can be used to shift the contents of a 
register right by n bits, by setting SH=32-N, MB=n, and ME=31. It can be used to clear the 
high-order 5 bits of a register and then shift the result left by n bits by setting SH=n, MB=b- 
nand ME=31-n. It can be used to clear the low-order n bits of a register, by setting SH=0, 
MB=0), and ME=31-n. 
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rlwnm x rlwnm« 


Rotate Left Word then AND with Mask Integer Unit 
rlwnm rA,rS,rB,MB,ME (Re=0) 
rlwnm. rA,rS,rB,MB,ME (Rc=1) 


[POWER mnemonics: rlnm, rInm.] 


10 11 15 16 20 21 25 26 30 31 


ne—rB[ 27-31] 

r—ROTL(SS, rn) 

m«-MASK(MB, ME) 

rAcr & m 
The contents of rS are rotated left the number of bits specified by rB[27—31]. A mask is 
generated having 1-bit from bit MB through bit ME and ()-bits elsewhere. The rotated data 
is ANDed with the generated mask and the result is placed into rA. 


Other registers altered: 
* Condition Register (CRO Field): 
Affected: LT, GT, EQ, SO (if Rc=1) 

The opcode rlwnm can be used to extract an n-bit field, that starts at variable bit position 
b in rS[0-31], right-justified into rA (clearing the remaining 32-n bits of rA), by setting 
rB[27-3 1 ]=b+n, MB=32-n, and ME=31. It can be used to extract an n-bit field, that starts 
at variable bit position b in rS[0-31], left-justified into rA (clearing the remaining 32-n bits 
of rA), by setting rB[27—3 1]=b, MB =0), and ME=n-1. It can be used to rotate the contents 
of a register left (or right) by variable n bits, by setting rB[27—31]=n (32-N), MB=0, and 
ME=31. 


Equivalent mnemonics are provided for some of these uses. 
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rribx POWER Architecture Instruction rribx 


Rotate Right and Insert Bit Integer Unit 
rrib rA,rS,rB (Re=()) 
rrib. rA,rS,rB (Rc=1) 
ee ee ee ee es eS ee ee 
0 5 6 10 11 15 16 20 21 30 31 


This instruction is not part of the PowerPC architecture. 


Bit 0 of rS is rotated right the amount specified by bits 27-31 of rB. The bit is then inserted 
into rA. 
Other registers altered: 
¢ Condition Register (CRO Field): 
Affected: LT, GT, EQ, SO (if Rc=1) 
Note: This instruction is specific to the MPC601. 
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sc sc 


System Call Integer Unit 


[POWER mnemonic: svca] 


Reserved 





0 5 6 10 11 15 16 29 30 31 


This instruction calls the operating system to perform a service. When control is returned 
to the program that executed the system call, the content of the registers depends on the 
register conventions used by the program providing the system service. 


This instruction is context synchronizing, as described in Section 3.1.2, “Context 
Synchronization”. Although the PowerPC architecture considers sc to be a branch 
processor instruction, it is executed by the integer processor in the MPC601. 


Other registers altered: 
Dependent on the system service 


POWER Compatibility Note: The PowerPC sc instruction is substantially different from 
the POWER svc instruction. The following aspects of these instructions were considered 
with respect to POWER compatibility: 


PowerPC defines the se instruction with the “LK” bit set to be an invalid form. POWER 
architecture defines the svc instruction (same opcode as PowerPC sc instruction) with the 
“LK” bit set as a valid form which places the address of the instruction following the sve 
into the link register. In the case of MPC601, an sc instruction with the “LK” bit set will 
execute correctly (as defined in PowerPC) and will update the link register with the address 
of the instruction following the sc instruction. 


PowerPC defines the sc instruction in such a manner that requires bit 30 of the instruction 
to be b'l' (when bit 30 is b’0’, the instruction is considered reserved). The POWER 
architecture svc instruction does not have such a restriction, and uses this bit to define an 
alternate form of the sve instruction. Although the MPC601 does not support this alternate 
form of the sve instruction, it does ignore the state of bit 30 of the instruction during decode 
and execution. 


As a result of executing an sc instruction, PowerPC defines bits 0-15 of register SRR1 to 
be undefined. In the case of MPC60)1, execution of the sc instruction will cause bits 16-31 
of the instruction to be placed into bits 0-15 of register SRR1. 


The effective address of the instruction following the system call instruction is placed into 
SRRO. Bits 16-31 of the MSR are placed into bits 16-31 of SRR1. 
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Then a system call exception is generated. The exception causes the MSR to be altered as 
described in Section 5.4, “Exception Definitions”. 


The exception causes the next instruction to be fetched from offset x*CO0’ from the 
physical base address indicated by the new setting of MSR[IP]. This instruction is context- 
synchronizing. 

Other registers altered: 

SRRO SRR1 MSR 
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slex POWER Architecture Instruction slex 


Shift Left Extended Integer Unit 

sle rA,rS,rB (Rce=()) 

sle. rA,rs,rB (Rc=1) 

aie eel a a ee ee 
10 11 15 16 20 21 30 31 


This instruction is not part of the PowerPC architecture. 


Register rS is rotated left n bits where n is the shift amount specified in bits 27-31 of rB. 
The rotated word is placed in the MQ register. A mask of 32-n ones followed by 7 zeros is 
generated. The logical AND of the rotated word and the generated mask is placed in rA. 
Other registers altered: 
¢ Condition Register (CRO Field): 
Affected: LT, GT, EQ, SO (if Rc=1) 
Note: This instruction is specific to the MPC601. 
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sleqx POWER Architecture Instruction sleqx 


Shift Left Extended with MQ Integer Unit 

sleq rA,rS,rB (Rc=0) 

sleq. rA,rS,rB (Rc=1) 

ge ee 
10 11 15 16 20 21 30 31 


This instruction is not part of the PowerPC architecture. 


Register rS is rotated left n bits where n is the shift amount specified in bits 27-31 of rB. 
A mask of 32-n ones followed by n zeros is generated. The rotated word is then merged 
with the contents of the MQ register, under control of the generated mask. The merged 
word is placed in rA. The rotated word is placed in the MQ register. 
Other registers altered: 

* Condition Register (CRO Field): 

Affected: LT, GT, EQ, SO (if Rc=1) 

Note: This instruction is specific to the MPC601. 
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sl Iqx POWER Architecture Instruction sl iqx 


Shift Left Immediate with MQ Integer Unit 

sliq rA,rS,SH (Rc=0) 

sliq. rA,rS,SH (Rc=1) 

ee ee Ee aie eee 
10 11 15 16 20 21 30 31 


This instruction is not part of the PowerPC architecture. 


Register rS is rotated left 1 bits where n is the shift amount specified by SH. The rotated 
word is placed in the MQ register. A mask of 32-n ones followed by n zeros is generated. 
The logical AND of the rotated word is placed into rA. 
Other registers altered: 
* Condition Register (CRO Field): 
Affected: LT, GT, EQ, SO (if Rc=1) 
Note: This instruction is specific to the MPC601. 
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slliqx POWER Architecture Instruction slliqx 


Shift Left Long Immediate with MQ Integer Unit 
slliq rA,rS,SH (Rc=()) 

slliq. rA,rS,SH (Rc=1) 

0 5 6 10 11 15 16 20 21 30 31 


This instruction is not part of the PowerPC architecture. 


Register rS is rotated left n bits where n is the shift amount specified by SH. A mask of 
32-n ones followed by n zeros is generated. The rotated word is then merged with the 
contents of the MQ register, under control of the generated mask. The merged word is 
placed into rA. The rotated word is placed into the MQ register. 
Other registers altered: 

* Condition Register (CRO Field): 

Affected: LT, GT, EQ, SO (if Rc=1) 

Note: This instruction is specific to the MPC6()1. 
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sllqx POWER Architecture Instruction sllqx 


Shift Left Long with MQ Integer Unit 

sllq rA,rS,rB (Rc=(0) 

sliq. rA,rS,rB (Rc=1) 

a a 
10 11 15 16 20 21 30 31 


This instruction is not part of the PowerPC architecture. 
Register rS is rotated left n bits where n is the shift amount specified in bits 27-31 of rB. 


When bit 26 of rB is a zero, a mask of 32-n ones followed by n zeros is generated. A word 
of zeros is then merged with the contents of the MQ register, under control of the generated 
mask. 


When bit 26 of rB is a one, a mask of 32-n ones followed by 1 ones is generated. A word 
of zeros is then merged with the contents of the MQ register, under control of the generated 
mask. 


The merged word is placed into rA. The MQ register is not altered. 


Other registers altered: 
¢ Condition Register (CRO Field): 
Affected: LT, GT, EQ, SO (if Rc=1) 
Note: This instruction is specific to the MPC601. 


10-160 PowerPC 601 RISC Microprecessor User's Manual MOTOROLA 


slqx POWER Architecture Instruction slqx 


Shift Left with MQ Integer Unit 

slq rA,rS,rB (Rc=0) 

slq. rA,rS,rB (Rc=1) 

eae Ee i eee ee ee eee 
10 11 15 16 20 21 30 31 


This instruction is not part of the PowerPC architecture. 


Register rS is rotated left n bits where n is the shift amount specified in bits 27-31 of rB. 
The rotated word is placed in the MQ register. 


When bit 26 of rB is a zero, a mask of 32-n ones followed by n zeros is generated. 
When bit 26 of rB is a one, a mask of all zeros is generated. 
The logical AND of the rotated word and the generated mask is placed into rA. 


Other registers altered: 
¢ Condition Register (CRO Field): 
Affected: LT, GT, EQ, SO (if Rc=1) 
Note: This instruction is specific to the MPC60)1. 
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—tk, 
bad 


slwx Slwx 


Shift Left Word Integer Unit 
slw rA,rS,rB (Rc=() 
slw. rA,rS,rB (Rc=1) 


[POWER mnemonics: sl, sI.] 


n—rB[27-3 1] 
rA<-ROTL(FS, n) 


If bit 16 of rB=0, the contents of rS are shifted left the number of bits specified by rB[26— 
31]. Bits shifted out of position 0 are lost. Zeros are supplied to the vacated positions on the 
right. The 32-bit result is placed into rA. If bit 16 of rB=1, 32 zeros are placed into rA. 


Other registers altered: 
¢ Condition Register (CRO Field): 
Affected: LT, GT, EQ, SO (if Rc=1) 
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sraqx POWER Architecture Instruction sraqx 


Shift Right Algebraic with MQ Integer Unit 
sraq rA,rS,rB (Rc=() 

sraq. rA,rS,rB (Rc=1) 

ie a ae Re ee ae 920 Fe 
0 56: 10 11 15 16 20 21 30 31 


This instruction is not part of the PowerPC architecture. 


Register rS is rotated left 32-n bits where n is the shift amount specified in bits 27-31 of 
rB. When bit 26 of rB is a zero, a mask of n zeros followed by 32-n ones is generated. 
When bit 26 of rB is a one, a mask of all zeros is generated. The rotated word is placed in 
the MQ register. The rotated word is then merged with a word of 32 sign bits from rS, under 
control of the generated mask. 


The merged word is placed in rA. 


The rotated word is ANDed with the complement of the generated mask. This 32-bit result 
is ORed together and then ANDed with bit 0 of rS to produce XER[CA]. 
Other registers altered: 
¢ Condition Register (CRO Field): 
Affected: LT, GT, EQ, SO (if Rc=1) 
° XER: 
Affected: CA 
All shift right algebraic instructions can be used for a fast divide by 2() if followed with 
addze. 


Note: This instruction is specific to the MPC601. 
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Sraiqx — POWER Architecture Instruction sraiqx 


Shift Right Algebraic Immediate with MQ Integer Unit 
sraiq rA,rS,SH (Rc=() 

sraiq. rA,rS,SH (Rc=1) 

0 5 6 10 11 15 16 20 21 30 31 


This instruction is not part of the PowerPC architecture. 


Register rS is rotated left 32-n bits where n is the shift amount specified by SH. A mask 
of n zeros followed by 32-n ones is generated. The rotated word is placed in the MQ 
register. The rotated word is then merged with a word of 32 sign bits from rS, under control 
of the generated mask. 


The merged word is placed in rA. 


The rotated word is ANDed with the complement of the generated mask. This 32-bit result 
is ORed together and then ANDed with bit 0 of rS to produce XER[CA]. 
Other registers altered: 
* Condition Register (CRO Field): 
Affected: LT, GT, EQ, SO (if Rc=1) 
° XER: 
Affected: CA 
All shift right algebraic instructions can be used for a fast divide by 2() if followed with 
addze. 


Note: This instruction is specific to the MPC601. 
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Ssrawx srawx 


Shift Right Algebraic Word Integer Unit 
sraw rA,rS,rB (Rc=0) 
sraw. rA,rS,rB (Rc=1) 


[POWER mnemonics: sra, sra.] 


ae REE a ee 


10 11 15 16 20 21 30 31 


nm—rB[27-31] 
rA<ROTL(rS, n) 


If rB[26]=0,then the contents of rS are shifted right the number of bits specified by rB[27— 
31]. Bits shifted out of position 31 are lost. The result is padded on the left with sign bits 
before being placed into rA. If rB[26]=1, then rA is filled with 32 sign bits (bit 0) from rS. 
CRO is set based on the value written into rA. 


Other registers altered: 
¢ Condition Register (CRO Field): 
Affected: LT, GT, EQ, SO Cif Rc=1) 
* XER: 
Affected: CA 
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srawix srawix 


Shift Right Algebraic Word Immediate Integer Unit 
srawi rA,rS,SH (Rc=0) 
srawi. rA,rS,SH (Rc=1) 


[POWER mnemonics: srai, srai.] 


5 


0 6 10 11 15 16 20 21 30 31 


n—~-SH 
rA¢ROTL(FS, 32-n) 


The contents of rS are shifted right SH bits. Bits shifted out of position 31 are lost. The 
shifted value is sign extended before being placed in rA. The 32-bit result is placed into rA. 
XER[CA] is set to 1 if rS contains a negative number and any 1-bits are shifted out of 
position 31; otherwise XER[CA] is cleared to 0. A shift amount of zero causes XER[CA] 
to be cleared to 0. 
Other registers altered: 
* Condition Register (CRO Field): 
Affected: LT, GT, EQ, SO (if Rc=1) 
¢ XER: 
Affected: CA 
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Srex POWER Architecture Instruction Srex 


Shift Right Extended Integer Unit 

sre rA,rS,rB (Rc=0) 

sre. rA,rS,rB (Rc=1) 

a a 0 a a eet eee 
10 11 15 16 20 21 30 31 


This instruction is not part of the PowerPC architecture. 


Register rS is rotated left 32-n bits where 1 is the shift amount specified in bits 27-31 of 
rB. The rotated word is placed in the MQ register. A mask of n zeros followed by 32-n ones 
is generated. The logical AND of the rotated word and the generated mask is placed in rA. 
Other registers altered: 
¢ Condition Register (CRO Field): 
Affected: LT, GT, EQ, SO (if Rc=1) 
Note: This instruction is specific to the MPC601. 
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sreax POWER Architecture Instruction Sreax 


Shift Right Extended Algebraic Integer Unit 

srea rA,rS,rB (Re=0) 

srea. rA,rS,rB (Rc=1) 

Le See Ree Sees [ee [eee 
10 11 15 16 20 21 30 31 


This instruction is not part of the PowerPC architecture. 


Register rS is rotated left 32-n bits where n is the shift amount specified in bits 27-31 of 
rB. A mask of n zeros followed by 32-n ones is generated. The rotated word is placed in 
the MQ register. The rotated word is then merged with a word of 32 sign bits from rS, under 
control of the generated mask. 


The merged word is placed in rA. 


The rotated word is ANDed with the complement of the generated mask. This 32-bit result 
is ORed together and then ANDed with bit 0 of rS to produce XER[CA]. 
Other registers altered: 
¢ Condition Register (CRO Field): 
Affected: LT, GT, EQ, SO (if Rc=1) 
¢ XER: 
Affected: CA 
Note: This instruction is specific to the MPC601. 
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sreqx POWER Architecture Instruction sreqx 


Shift Right Extended with MQ Integer Unit 
sreq rA,rS,rB (Rc=0) 

sreq. rA,rS,rB (Rc=1) 

ee Se 729 Fe 
0 5 6 10 11 15 16 20 21 30 31 


This instruction is not part of the PowerPC architecture. 


Register rS is rotated left 32-n bits where n is the shift amount specified in bits 27-31 of 
rB. A mask of n zeros followed by 32-n ones is generated. The rotated word is then merged 
with the contents of the MQ register, under control of the generated mask. The merged 
word is placed in rA. The rotated word is placed into the MQ register. 
Other registers altered: 

¢ Condition Register (CRO Field): 

Affected: LT, GT, EQ, SO Cif Rc=1) 

Note: This instruction is specific to the MPC6()1. 
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sriqxx POWER Architecture Instruction sriqx 


Shift Right Immediate with MQ Integer Unit 
sriq rA,rS,SH (Rce=0) 

sriq. rA,rS,SH (Rc=1) 

(a a ES ee ee 
0 5 6 10 11 15 16 20 21 30 31 


This instruction is not part of the PowerPC architecture. 


Register rS is rotated left 32-n bits where n is the shift amount specified by SH. The rotated 
word is placed into the MQ register. A mask of zeros followed by 32-n ones is generated. 
The logical AND of the rotated word and the generated mask is placed in rA. 
Other registers altered: 
¢ Condition Register (CRO Field): 
Affected: LT, GT, EQ, SO (if Rc=1) 
Note: This instruction is specific to the MPC601. 
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srl Iqx POWER Architecture Instruction srl iqx 


Shift Right Long Immediate with MQ Integer Unit 
srliq rA,rS,SH (Rc=0) 
srliq. rA,rS,SH (Rc=1) 

10 11 15 16 20 21 30 31 


This instruction is not part of the PowerPC architecture. 


Register rS is rotated left 32-n bits where n is the shift amount specified by SH. A mask of 
n zeros followed by 32-n ones is generated. The rotated word is then merged with the MQ 
register, under control of the generated mask. The merged word is placed in rA. The rotated 
word is placed into the MQ register. 
Other registers altered: 

¢ Condition Register (CRO Field): 

Affected: LT, GT, EQ, SO (if Rc=1) 

Note: This instruction is specific to the MPC601. 
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srlqx POWER Architecture Instruction sriqx 


Shift Right Long with MQ Integer Unit 

srlq rA,rs,rB (Rc=(0) 

srlq. rA,rS,rB (Rc=1) 

Fe ae ee (ee re ee ee eee 
10 11 15 16 20 21 30 31 


This instruction is not part of the PowerPC architecture. 


Register rS is rotated left 32-n bits where n is the shift amount specified in bits 27-31 of rB. 
When bit 26 of rB is a zero, a mask of n zeros followed by 32-n ones is generated. The 
rotated word is then merged with the MQ register, under control of the generated mask. 


When bit 26 of rB is a one, a mask of n ones followed by 32-n zeros is generated. A word 
of zeros is then merged with the contents of the MQ register, under control of the generated 
mask. 


The merged word is placed in rA. The MQ register is not altered. 
Other registers altered: 
* Condition Register (CRO Field): 
Affected: LT, GT, EQ, SO (if Rc=1) 
Note: This instruction is specific to the MPC601. 
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srqx POWER Architecture Instruction srqx 


Shift Right with MQ Integer Unit 
srq rA,rS,rB (Rc=0) 
srq. rA,rS,rB (Rc=1) 

[a [s [| «a [=| = re 
0 5 6 10 11 15 16 20 21 30 31 


This instruction is not part of the PowerPC architecture. 


Register rS is rotated left 32-n bits where n is the shift amount specified in bits 27-31 of rB. 
The rotated word is placed into the MQ register. 


When bit 26 of rB is a zero, a mask of n zeros followed by 32-n ones is generated. 
When bit 26 of rB is a one, a mask of all zeros is generated. 
The logical AND of the rotated word and the generated mask is placed in rA. 


Other registers altered: 
¢ Condition Resister (CRO Field): 
Affected: LT, GT, EQ, SO (if Rc=1) 
Note: This instruction is specific to the MPC601. 
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Srwx Srwx 


Shift Right Word Integer Unit 
srw rA,rS,rB (Rc=0) 
srw. rA,rS,rB (Rc=1) 


[POWER mnemonics: sr, sr.] 


Le a ee ee eee) 
5 


0 6 10 11 15 16 20 21 30 31 


n—rB[27-3 1] 
rACROTL(S, 32-n) 


If rB[26]=0, the contents of rA are shifted right the number of bits specified by rA[27-31]. 
Bits shifted out of position 31 are lost. Zeros are supplied to the vacated positions on the 
left. The 32-bit result is placed into rA. 


If rB[26]=1, then rA is filled with zeros. 


Other registers altered: 
¢ Condition Register (CRO Field): 
' Affected: LT, GT, EQ, SO (if Rc=1) 
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stb stb 


Store Byte Integer Unit 
stb rS,d(rA) 


a ae Ra eat a as ee a 


0 5 6 10 11 15 16 31 


if rA = 0 then b¢—-0 

else be-(rA) 
EA<-b + EXTS(d) 
MEMC(EA, 1)¢—rS[24-31] 


EA is the sum (rAl0)+d. Register rS[24—31] is stored into the byte in memory addressed 
by EA. Register rS is unchanged. 


Other registers altered: 


¢ None 
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stbu stbu 


Store Byte with Update Integer Unit 
stbu rS,d(rA) 


a ae a a ee ee, eee 
0 


5 6 10 11 15 16 31 


if rA = 0 then bh—0 

else b¢-(rA) 
EAé—-b + EXTS(d) 
MEMC(EA, 1)<—rS[24-3 1] 
rA¢-EA 


EA is the sum (rAl0)+d. Register rS[24-31] is stored into the byte in memory addressed 
by EA. 


EA is placed into rA. 


While the PowerPC architecture defines the instruction form as invalid if rA=0, the 
MPC601 supports execution with rA=0 as shown above. 


Other registers altered: 


e None 
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stbux stbux 


Store Byte with Update Indexed Integer Unit 
stbux rs,rA,rB 


Reserved 





0 5 6 10 11 15 16 21 22 30 31 


if rA = 0 then bé—-0 

else b#(rA) 

EA€b + (rB) 

MEM(BA, 1)€—rS[24-31] 

rA@EA 
EA is the sum (rAl0)+(rB). Register rS[24—31] is stored into the byte in memory addressed 
by EA. 


EA is placed into rA. 


While the PowerPC architecture defines the instruction form as invalid if rA=0, the 
MPC6()1 supports execution with rA=0 as shown above. 


Other registers altered: 


¢ None 
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stbx stbx 


Store Byte Indexed Inteaer Unit 


stbx rs,rA,rB 


Reserved 





if rA = 0 then b¢-0 

else b¢(rA) 
EA€b + (rB) 

EM(EA, 1) < rS[24-31] 


EA is the sum (rAl0)+(rB). Register rS[24—31] is stored into the byte in memory addressed 
by EA. Register rS is unchanged. 
Other registers altered: 


« None 
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stid stid 


Store Floating-Point Double-Precision Floating-Point Unit 
stfd frS,d(rA) 
eae a EE 
0 5 6 10 11 15 16 30 31 
if rA= 0 then b—-0 
else b¢€-(rA) 
EA@b + EXTS(d) 


MEM(EA, 8)<—-(frS) 
EA is the sum (rAlQ) + d. 


The contents of register frS is stored into the double word in memory addressed by EA. 


Other registers altered: 


« None 
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stfdu stidu 


Store Floating-Point Double-Precision with Update Floating-Point Unit 
stfdu frS,d(rA) 
0 5 6 10 11 15 16 31 
if rA=0 then b¢é-0 
else b(rA) 
EAé—-b +d 
MEMC(EA, 4)<- SINGLE(frS) 
rAEA 


EA is the sum (rAlQ) + d. 
The contents of register frS is stored into the double word in memory addressed by EA. 
EA is placed into rA. 


While the PowerPC architecture defines the instruction form as invalid if rA=(, the 
MPC601 supports execution with rA=( as shown above. 


Other registers altered: 


« None 
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stfdux stfdux 


Store Floating-Point Double-Precision with Update Indexed Floating-Point Unit 


stfdux frS,rA,rB 


Reserved 





0 5 6 10 11 15 16 20 21 30 31 


if rA= 0 then bé—-0 
else b¢—-(rA) 
EA¢€—b + (rB) 
MEMC(EA, 8)<—(frS) 
rA@EA 


EA is the sum (rAlQ) + (rB). 
The contents of register frS is stored into the double word in memory addressed by EA. 
EA is placed into rA. 


While the PowerPC architecture defines the instruction form as invalid if rA=0, the 
MPC60)1 supports execution with rA=0 as shown above. 


Other registers altered: 


« None 
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stfdx stfidx 


Store Floating-Point Double-Precision Indexed Floating-Point Unit 


stfdx frS,rA,rB 


Reserved 





0 5 6 10 11 15 16 20 21 30 31 


if rA + 0 then b <0 
else be—(rA) 
EA<b + (rB) 
MEMC(EA, 8)<€—(frS) 


EA is the sum (rAlQ) + (rB). 
The contents of register frS is stored into the double word in memory addressed by EA. 


Other registers altered: 


¢ None 
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stfs stfs 


Store Floating-Point Single-Precision Integer Unit and Floating-Point Unit 
stfs frS,d(rA) 
0 5 6 10 11 15 16 31 
if rA=0 then b¢—-0 
else b<—-(rA) 
EAe-b + EXTS(d) 
MEM(EA, 4)<-SINGLE(frS) 


EA is the sum (rAlQ)+d. 


The contents of register frS is converted to single-precision and stored into the word in 
memory addressed by EA. 


Other registers altered: 


« None 
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stfsu stfsu 


Store Floating-Point Single-Precision with Update Integer Unit and Floating-Point Unit 
stfsu frS,d(rA) 


[Pa Sane Ee el eee eens, 
6 


0 5 10 11 15 16 31 


if rA = 0 then b¢—0 

else b<¢-(rA) 

EA@b + EXTS(d) 
MEMC(EA, 4)<-SINGLE(frS) 
rA@—EA 


EA is the sum (rAlQ) + d. 


The contents of frS is converted to single-precision and stored into the word in memory 
addressed by EA. 


EA is placed into rA. 


While the PowerPC architecture defines the instruction form as invalid if rA=0, the 
MPC6()1 supports execution with rA=0 as shown above. 


Other registers altered: 


e None 
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stfsux stisux 


Store Floating-Point Single-Precision with Update Indexed Integer Unit and 
Floating-Point Unit 


stfsux frS,rA,rB 


Reserved 





0 5 6 10 11 15 16 20 21 30 31 


if rA=0 then b¢-0 

else b¢€—(rA) 

EA<b + (rB) 

MEMC(EA, 4)¢—-SINGLE(frS) 
rA<-EA 


EA is the sum (rAlQ) + (rB). 


The contents of frS is converted to single-precision and stored into the word in memory 
addressed by EA. 


EA is placed into rA. 


While the PowerPC architecture defines the instruction form as invalid if rA=0, the 
MPC6()1 supports execution with rA=0 as shown above. 


Other registers altered: 


« None 





MOTOROLA Chapter 10. instruction Set 10-185 


stfsx stfsx 


Store Floating-Point Single-Precision Indexed Integer Unit and 
Floating-Point Unit 


stfsx frS,rA,rB 


Reserved 





if rA=0 then b<-0 

else be-(rA) 

EA<—b + (rB) 

MEMCEA, 4)<-SINGLE(frS) 


EA is the sum (rAlQ) + (rB). 


The contents of register frS is converted to single-precision and stored into the word in 
memory addressed by EA. 


Other registers altered: 


¢ None 
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sth | sth 


Store Half Word Integer Unit 
sth rS,d(rA) 
ae See a Sa ee: ee 
0 5 6 10 11 15 16 31 
if rA= 0 then b<-0 
else be-(rA) 
EA¢—-b + EXTS(d) 


MEM(EA, 2)<—rS[ 16-31] 
EA is the sum (rAl0) + d. Register rS[16-31] is stored into the half word in memory 
addressed by EA. 
Other registers altered: 


« None 
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sthbrx sthbrx 


Store Half Word Byte-Reverse Indexed Integer Unit 


sthbrx rs,rA,rB 


Reserved 





0 5 6 10 14 15 16 20 21 30 31 


if rA= 0 then b<-0 

else b¢-(rA) 

EA<-b + (rB) 

MEMC(EA, 2)€-rS[24-3 1] Il rS[ 16-23] 


EA is the sum (rAl0)+(rB). The contents of rS[24—31] are stored into bits 0-7 of the half 
word in memory addressed by EA. Bits rS[16—23] are stored into bits 8-15 of the half word 
in memory addressed by EA. 


Other registers altered: 


« None 
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sthu sthu 


Store Half Word with Update Integer Unit 
sthu rS,d(rA) 


ee i ee ee ee eee eee 
0 


5 6 10 11 15 16 31 


if rA = 0 then b¢—0 

else b¢-(rA) 

EA€-b + EXTS(d) 

MEMC(EA, 2)<rS{ 16-31] 

rA¢-EA 
EA is the sum (rAlQ)+d. The contents of rS[16—31] are stored into the half word in memory 
addressed by EA. 


EA is placed into rA. 


While the PowerPC architecture defines the instruction form as invalid if rA=0, the 
MPC601 supports execution with rA=0 as shown above. 


Other registers altered: 


¢ None 
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sthux sthux 


Store Half Word with Update Indexed Integer Unit 


sthux rs,rA,rB 


Reserved 





0 5 6 10 11 15 16 20 21 30 31 


if rA= 0 then b¢-0 

else be(rA) 

EA¢-b + (rB) 

MEMCEA, 2)<—rS[ 16-31] 

rA¢-EA 
EA is the sum (rAlQ)+(rB). Register rS[16-31] is stored into the half word in memory 
addressed by EA. 


EA is placed into rA. 


While the PowerPC architecture defines the instruction form as invalid if rA=0, the 
MPC601 supports execution with rA=0 as shown above. 


Other registers altered: 


¢ None 
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sthx sthx 


Store Half Word Indexed Integer Unit 
sthx rs,rA,rB 


Reserved 





0 5 6 10 11 15 16 20 21 30 31 


if rA = 0 then b—0 

else be-(rA) 
EAtb + (rB) 

MEMC(EA, 2)<-rS[16-31] 


EA is the sum (rAlQ) + (rB). Register rS[16-31] is stored into the half word in memory 
addressed by EA. 
Other registers altered: 


¢ None 
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stmw stmw 


Store Multiple Word Integer Unit 


stmw rS,d(rA) 
[POWER mnemonic: stm] 


LS ane ee aie a Sea ae en 
0 


5 6 10 11 15 16 31 


if rA = 0 then b¢—0 


else be—(rA) 
EA@b + EXTS(d) 
rérs 


do while r < 31 
MEMC(EA, 4) <— GPR(r) 


rer + 1 
EA@EA+4 
EA is the sum (rAlQ) + d. 
n= (32 -rS). 


nconsecutive words starting at EA are stored from the GPRs rS through 31. For example, 
if rS=30, 2 words are stored. 


EA must be a multiple of 4; otherwise, the system alignment error handler may be invoked. 


Other registers altered: 


¢ None 
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stswi stswi 


Store String Word Immediate Integer Unit 


stswi rS,rA,NB 
[POWER mnemonic: stsi] 


Reserved 





if rA = 0 then EA—0 


else EA<€—(rA) 
if NB = 0 then n¢—32 
else n¢-NB 
r—rS-1 

ic-0 


do while n>0 
if i= 0 then ré—r+1 (mod 32) 
MEMC(EA, 1)¢-GPR(r)[i:i+7] 
ii+8 
if i= 31 theni¢—0 
EA<-EA+1 
né-n-1 


EA is (rAlQ). Let n = NB if NB#0, n = 32 1f NB=0; n is the number of bytes to store. Let 
nr = CEIL(n/4): nr is the number of registers to supply data. 


nconsecutive bytes starting at EA are stored from GPRs rS through rS+nr-1. 


Bytes are stored left to right from each register. The sequence of registers wraps around 
through GPRO if required. 


Other registers altered: 


¢ None 
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stSwx stswx 


Store String Word Indexed Integer Unit 


stswx rS,rA,rB 


[POWER mnemonic: stsx] 


Reserved 





if rA=0 then b<—0 

else be(rA) 

EA<—b+(rB) 

n¢-XER[25-31] 

r—rS-1 

iO 

do while n>0 
if i= 0 then ré—r+1 (mod 32) 
MEMC(EA, 1)¢~GPR(1)[i:i+7] 
i it8 
if i= 31 then i¢—0 
EA<-EA+1 
n&n-1 


EA is the sum (rAl0)+(rB). Let n = XER[25-31]; 1 is the number of bytes to store. 
Let nr = CEIL(n/4): nr is the number of registers to supply data. 
nconsecutive bytes starting at EA are stored from GPRs rS through rS+nr-1. 


Bytes are stored left to right from each register. The sequence of registers wraps around 
through GPRO if required. 


Other registers altered: 


¢ None 
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stw stw 
Store Word Integer Unit 
stw rS,d(rA) | 

[POWER mnemonic: st] 


ee ee ee ee ee ee ees 
5 


S 
0 6 10 11 15 16 31 


if rA= 0 then b—O 
else be(rA) 
EAeb + EXTS(d) 
MEMC(EA, 4)<rS 


EA is the sum (rAI0) + d. The contents of rS are stored into the word in memory addressed 
by EA. 
Other registers altered: 


« None 
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stwbrx stwbrx 


Store Word Byte-Reverse Indexed Integer Unit 


stwbrx rS,rA,rB 
[POWER mnemonic: stbrx] 


Reserved 





0 5 6 10 11 15 16 20 21 30 31 


if rA = 0 then be—0 

else be-(rA) 

EAC + (rB) 

MEM(EA, 4)<—rS[24-31] Il rS[ 16-23] Il rS[8-15] ll rS{0-7] 
EA is the sum (rAl0)+(rB). The contents of rS[24—31] are stored into bits 0-7 of the word 
in memory addressed by EA. Bits rS[16-23] are stored into bits 8-15 of the word in 
memory addressed by EA. Bits rS[8—15] are stored into bits 16-23 of the word in memory 
addressed by EA. Bits rS[0—7] are stored into bits 24-31 of the word in memory addressed 
by EA. 


Other registers altered: 


* None 
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StWCcx. StWCX. 


Store Word Conditional Indexed Integer Unit 
stwex. rS,rA,rB 
0 5 6 10 11 15 16 20 21 30 31 
if rA=0 then b—0 
else be(rA) 
EAeb + (rB) 


if RESERVE then 
MEMC(EA, 4)<rS 
RESERVE<0 
CRO@0b00 Il Ob III XER[SO] 
else 
CRO@O0B00 II OBO !l XER[SO] 


EA is the sum (rAlQ)+(rB). 
If a reservation exists, the contents of rS are stored into the word in memory addressed by 


EA and the reservation is cleared. If no reservation exists, the instruction completes without 
altering memory. 


CRO Field is set to reflect whether the store operation was performed (i.e., whether a 
reservation existed when the stwex. instruction commenced execution) as follows. 


CRO[LT GT EQ SO] —b‘00” Il store_performed || XER[SO] 


The EQ bit in the condition register field CRO is modified to reflect whether the store 
operation was performed (i.e., whether a reservation existed when the stwex. instruction 
began execution). If the store was completed successfully, the EQ bit is set to one. 


EA must be a multiple of 4; otherwise, the system alignment error handler may be invoked 
or the results may be undefined. 


Other registers altered: 
* Condition Register (CRO Field): 
Affected: LT, GT, EQ, SO 
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stwu stwu 
Store Word with Update Integer Unit 
stwu rS,d(rA) 

[POWER mnemonic: stu] 


ee: Re ee ee ee ee es 
0 


5 6 10 11 15 16 31 


if rA= 0 then b<O 
else be(rA) 
EAeb + EXTS(d) 
MEMC(EA, 4)<—rS 
rAcEA 


EA is the sum (rAl())+d. The contents of rS are stored into the word in memory addressed 
by EA. 


EA is placed into rA. 


While the PowerPC architecture defines the instruction form as invalid if rA=0, the 
MPC6()1 supports execution with rA=0 as shown above. 


Other registers altered: 


« None 
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stwux stwux 


Store Word with Update Indexed Integer Unit 


stwux rs,rA,rB 
[POWER mnemonic: stux] 


Reserved 





0 2°56 10 11 15 16 20 21 30 31 


if rA = 0 then be—0 
else be-(rA) 
EAeb + (rB) 
MEMC(EA, 4)<-rS 
rAcEA 


EA is the sum (rAl0O)+(rB). The contents of rS are stored into the word in memory 
addressed by EA. 


EA is placed into rA. 


While the PowerPC architecture defines the instruction form as invalid if rA=(, the 
MPC601 supports execution with rA=() as shown above. 


Other registers altered: 


« None 
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stwx stwx 


Store Word Indexed Integer Unit 


stwx rS,rA,rB 


[POWER mnemonic: stx] 


Reserved 





if rA = 0 then be-0 
else be—(rA) 
EA¢-b + (rB) 
MEMC(EA, 4)<—-rS 


- EA is the sum (rAlO)+(rB). The contents of rS are is stored into the word in memory 


addressed by EA. 


Other registers altered: 


¢ None 
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subftx subfx 


Subtract from Integer Unit 

subf rD,rA,rB (OE=0 Rc=0) 

subf. rD,rA,rB (OE=0 Rc=1) 

subfo rD,rA,rB (OE=1 Rc=() 

subfo. rD,rA,rB (OE=1 Rc=1) 

a ee ee 
10 11 15 16 20 21 22 30 31 


rDe——{rA) + (rB) + 1 
The sum -+rA) + (rB) +1 is placed into rD. 


Other registers altered: 
¢ Condition Register (CRO Field): 


Affected: LT, GT, EQ, SO (if Rc=1) 
¢ XER: 
Affected: SO, OV (if OE=1) 
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subfcx subfcx 


Subtract from Carrying Integer Unit 
subfc rD,rA,rB (OE=0 Rc=(0) 
subfc. rD,rA,rB (OE=0 Rc=1) 
subfco rD,rA,rB (OE=1 Rc=0) 
subfco. rD,rA,rB (OE=1 Rc=1) 


[POWER mnemonics: sf, sf., sfo, sfo.] 


Ee dee Ee ee 


10 11 15 16 20 21 22 30 31 
rD¢—~(rA) +(rB) + | 
The sum —-(rA) + (rB) + 1 is placed into rD. 


Other registers altered: 
¢ Condition Register (CRO Field): 


Affected: LT, GT, EQ, SO (if Rc=1) 
¢ XER: 

Affected: CA 

Affected: SO, OV (if OE=1) 
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subfex subfex 


Subtract from Extended Integer Unit 
subfe rD,rA,rB (OE=0 Rc=0) 
subfe. rD,rA,rB (OE=0 Rc=1) 
subfeo rD,rA,rB (OE=1 Rc=0) 
subfeo. rD,rA,rB (OE=1 Rc=1) 


[POWER mnemonics: sfe, sfe., sfeo, sfeo.] 


eel ae ee ee ee ee 


10 11 15 16 20 21 22 30 31 
rD~~(rA) + (rB) + XER[CA] 
The sum -rA) + (rB) + XER[CA] is placed into rD. 


Other registers altered: 
* Condition Register (CRO Field): 


Affected: LT, GT, EQ, SO (if Rc=1) 
¢ XER: 

Affected: CA 

Affected: SO, OV (if OE=1) 
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subfic subfic 


Subtract from Immediate Carrying Integer Unit 


subfic rD,rA,SIMM 
[POWER mnemonic: sfi] 


a 


0 5 6 10 11 15 16 31 


rDe~ A(rA) + EXTS(SIMM) + | 
The sum--(rA) + EXTS(SIMM) + 1 is placed into rD. 
Other registers altered: 
°* XER: 
Affected: CA 
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subfmex subfme x 


Subtract from Minus One Extended . Integer Unit 
subfme rD,rA (OE=0 Re=0) 
subfme. rD,rA (OE=0 Rc=1) 
subfmeo rD,rA (OE=1 Rc=0) 
subfmeo. rD,rA (OE=1 Rc=1) 


[POWER mnemonics: sfme, sfme., sfmeo, sfmeo. | 


Reserved 





rD~——(rA) + XER[CA] - | 
The sum -(rA) + XER[CA] + x >FFFFFFFF’ is placed into rD. 


Other registers altered: 
* Condition Register (CRO Field): 


Affected: LT, GT, EQ, SO (if Rc=1) 
¢ XER: 

Affected: CA 

Affected: SO, OV (if OE=1) 
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subfizex subfzex 


Subtract from Zero Extended Integer Unit 
subfze rD,rA (OE=0 Rc=0) 
subfze. rD,rA (OE=0 Rc=1) 
subfzeo rD,rA (OE=1 Rc=0) 
subfzeo. rD,rA (OE=1 Rc=1) 


[POWER mnemonics: sfze, sfze., sfzeo, sfzeo. ] 


Reserved 





0 5 6 10 11 15 16 20 21 22 30 31 


rDe —(rA) + XER[CA] 
The sum -(rA) + XER[CA] is placed into rD. 


Other registers altered: 
¢ Condition Register (CRO Field): 


Affected: LT, GT, EQ, SO (if Rc=1) 
¢ XER: 
Affected: CA 
Affected: SO, OV (if OE=1) 
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sync sync 


Synchronize Integer Unit 


[POWER mnemonic: des] 


Reserved 





0 5 6 10 11 15 16 20 21 30 31 


The sync instruction provides an ordering function for the effects of all instructions 
executed by a given processor. Executing a sync instruction ensures that all instructions 
previously initiated by the given processor appear to have completed before any subsequent 
instructions are initiated by the given processor. When the sync instruction completes, all 
external accesses initiated by the given processor prior to the syne will have been 
performed with respect to all other mechanisms that access memory. 


The sync instruction can be used to ensure that the results of all stores into a data structure, 
performed in a “critical section” of a program, are seen by other processors before the data 
structure is seen as unlocked. The eieio instruction may be more appropriate than sync for 
cases in which the only requirement is to control the order in which external references are 
seen by I/O devices. 


Other registers altered: 


« None 
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tlbie tlbie 


Translation Lookaside Buffer Invalidate Entry Integer Unit 


tIbie rB 
[POWER mnemonic: tlbi] 





Reserved 
0 5 6 10 11 15 16 20 21 30 31 
EA(rB) 
if UTLB entry exists for EA, then 
UTLB entry~invalid 


EA is the contents of rB. The translation lookaside buffer (referred to as the UTLB) 
containing entries corresponding to the EA are made invalid (i.e., removed from the 
UTLB). Additionally, a TLB invalidate operation is broadcast on the system interface. 


The UTLB search is done regardless of the settings of MSR[IT] and MSR[DT]. 
Block address translation for EA, if any, is ignored. 


If the segment register for EA specifies SR[T]=1 (an I/O controller interface segment), no 
UTLB entry invalidation is performed on the local processor and no TLB invalidate 
operation is broadcast on the system interface. 


Because the MPC601 supports broadcast of TLB entry invalidate operations, then the 
following must be observed: 


¢ The tlbie instruction(s) must be contained in a critical section, controlled by 
software locking, so that tlbie is issued on only one processor at a time. 


¢ A Sync instruction must be issued after every tlbie and at the end of the critical 
section. This causes the hardware to wait for the effects of the preceding tibie 
instructions(s) to propagate to all processors. 


A processor detecting a TLB invalidate broadcast performs the following: 
1. Prevents execution of any new load, store, cache control or tlbie instructions and 
prevents any new reference or change bit updates 


2. Waits for completion of any outstanding memory operations (including updates to 
the reference and change bits associated with the entry to be invalidated) 


3. Invalidates the two entries (both associativity classes) in the UTLB indexed by the 
matching address 


4, Resumes normal execution 





10-208 PowerPC 601 RISC Microprocessor User's Manual MOTOROLA 


This is a supervisor-level instruction. 


The software must ensure that SDR1 points to the page table when issuing tlbie, even when 
address translation is disabled. Nothing is guaranteed about instruction fetching in other 
processors if the tlbie instruction deletes the page in which some other processor 1s 
currently executing. 


This instruction is optional in the PowerPC architecture. 


Other registers altered: 


« None 
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tw tw 
Trap Word Integer Unit 


tw TO,rA,rB 
[POWER mnemonic: t] 


Reserved 





at EXTS(rA) 

he EXTS(rB) 

if (a <b) & TO[O] then TRAP 
if (a> b) & TO[1] then TRAP 
if (a= b) & TO[2] then TRAP 
if (a <U b) & TO[3] then TRAP 
if (a>U b) & TO[4] then TRAP 


The contents of rA are compared with the contents of rB. If any bit in the TO field is set to 
1 and its corresponding condition is met by the result of the comparison, then the system 
trap handler is invoked. 
Other registers altered: 


* None 
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twi twi 


Trap Word Immediate Integer Unit 


twi TO,rA,SIMM 
[POWEP. mnemonic: ti] 


5 6 


0 10 11 15 16 31 


ac EXTS(rA) 

if (a < EXTS(SIMM)) & TO[0] then TRAP 

if (a > EXTS(SIMM)) & TO[1] then TRAP 

if (a= EXTS(SIMM)) & TO[2] then TRAP 

if (a <U EXTS(SIMM)) & TO[3] then TRAP 

if (a >U EXTS(SIMM)) & TO[4] then TRAP 
The contents of rA are compared with the sign-extended SIMM field. If any bit in the TO 
field is set to 1 and its corresponding condition is met by the result of the comparison, then 
the system trap handler is invoked. 


Other registers altered: 


« None 
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XOrx XOrx 


XOR Integer Unit 
xor rA,rS,rB (Rc=0) 
xor. rA,rS,rB (Rc=1) 
Fae Des ee ae (a 
10 11 15 16 20 21 30 31 
rAe(rS) ® (rB) 


The contents of rAis XORed with the contents of rB and the result is placed into rA. 


Other registers altered: 
* Condition Register (CRO Field): 
Affected: LT, GT, EQ, SO (if Rc=1) 
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xori xori 


XOR Immediate Integer Unit 


xori rA,rs,UIMM 
[POWER mnemonic: xoril] 


0 


5 6 10 11 15 16 31 
rA€(rS) ® ((16)0 Il UIMM) 
The contents of rS is XORed with x’0000' || UIMM and the result is placed into rA. 


Other registers altered: 


« None 
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xoris XOris 
XOR Immediate Shifted Integer Unit 


xoris rA,rS,UIMM 


[POWER mnemonic: xoriu] 


0 5 6 10 11 15 16 31 
rA<(rS) ® (UIMM II (16)0) 
The contents of rS is XORed with UIMM I! x'0000’ and the result is placed into rA. 
Other registers altered: 


¢ None 
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10.3 Instructions Not Implemented by the MPC601 


Table 10-6 provides a list of 32-bit instructions that are not implemented by the MPC601, 
and that generate an illegal instruction exception. Refer to Appendix C, “PowerPC 
Instructions Not Implemented in MPC601”, for a more detailed description of the 
instructions. 


Table 10-6. 32-Bit Instructions Not Implemented by the MPC601 


j Mnemonic | otucton 
[ies | Prins a fect esinatsSiePedaon 
tere Resin spo Save Ret xin 
[edt [Rone PantSoees 
[sgn [Pasrgramemaeron 
ears 
Co 
[en 
ie 
















[ice | Wastin asa ulna Eno raen | 


Table 10-7 provides a list of 32-bit SPR encodings that are not implemented by the 
MPC6()1. 


Table 10-7. 32-Bit SPR Encodings Not Implemented by the MPC601 


Name 


os | oa | me 

















10000 11101 DBAT2L Supervisor 
10000 11110 DBAT3U Supervisor 
10000 11111 DBAT3L Supervisor 


Table 10-8 provides a list of 64-bit instructions that are not implemented by the MPC601, 
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and that generate an illegal instruction exception. Refer to Appendix C, “PowerPC 
Instructions Not Implemented in MPC601”. 


Table 10-8. 64-Bit Instructions Not Implemented by the MPC601 


Count Leading Zeros Double Word 
}divd =| Divide Double Word 
Divide Double Word Unsigned 


Extend Sign Word 



























Floating Convert From Integer Double Word 





fetid =| Floating Convert to Integer Double Word 
fetidz =| Floating Convert to Integer Double Word with Round to Zero 
}id | Load Double Word 


Load Double Word and Reserve Indexed 






fr [en ie Wain Oped 
[ex —[eesbaie Wedded 
a 
Ce 
Piox [ase Weaabacied 
muti [wtp Deoie es 
a 
Ec 
[iia [Potts ts Oa Wan eta 
[iter [Poteet ae atin Gow Rt 
ite [Poteet ei vod median on Gear 
ise) [Rote La oe Wa se nonce tot 
Ce 
ini [Rots La Ou Wad edi enlsset 
[sie [Silent 
[oie [SB ivaloeny 
stir | SB ivelecneyoyae 
faa [Sitistonwewed 
FS 
Ee 
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Table 10-8. 64-Bit Instructions Not !mplemented by the MPC601 (Continued) 


Shift Right Double Word 







Store Double Word 


| Moemenie 

a 

a 

cs 
[sau [Seretonie wean tsa 
[sau | See Caie wowindes in Upaao 
[sex | Seetaniewemmes 
fs _[fepoaiowes 
[st [Tap onic wocinedas 


Table 10-9 provides the 64-bit SPR encoding that is not implemented by the MPC601. 
Table 10-9. 64-Bit SPR Encoding Not Implemented by the MPC601 


| Decimal SPR[5-9] | SPR[O-4] 
01000 11000 







Register 
Name 












Supervisor 
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Appendix A 
Instruction Set Listings 


This appendix lists the instruction set implemented in the MPC601 and the additional 
PowerPC instructions not implemented in the MPC60, first sorted by mnemonic and then 
by opcode. 


A.1 Complete Instruction List Sorted by Mnemonic 


Table A-1 lists the instructions implemented in the MPC6(0)1 plus those defined in the 
PowerPC architecture that are not implemented in the MPC601 in alphabetical order by 
mnemonic. | 


Table A-1. Complete Instruction List Sorted by Mnemonic 
Key: 


Reserved bits 


Instruction not implemented in the MPC601 





Name 0 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 
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Name 
bx 
bex 
bectrx 
belrx 
cles 
cmp 
cmpi 
cmp! 


cmpli 


entlzwx 
crand 
crandc 
creqv 
crnand 
crnor 
cror 
crore 
crxor 
debf 
debi 
debst 
debt 
dcbtst 
dcbz 


divx 


divsx 
divwx 
divwux 
dozx 


dozi 


0 


_ 
—_— | = 


ee ee ee ee (¢>) = | @wW wfwl]l — 
oOo}; oOo}; © — oOo; — — | © 


1 


= _ — _ pare 


wy} w& 


wo 
pair 


_ _ = 


P 


6 7 8 § 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 


ee ae 








owerPC 601 RISC Microprocessor User's Manual MOTOROLA 


Name 0 6 7 8 9 10 11 12 138 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 


fmsubx 


fmsubsx 


fmulx 
fmulsx 


fnabsx 
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6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 


fsubx 


fsubsx 
icbi 


isync 
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Name 0 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 


_ 


w 
are 


— @ mere G@ ©o = _ 


maskgx 
maskirx 


merf 


oO 


mcerfs 


merxr 


w 
= 


mfcr 


[o> 


mffsx 


wo 
— 


mfmsr 


@ 
—_ 


mfspr 


mfsr 


w 
= 


mfsrin 


= _ Ww = Ww ow = 
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Name 0 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 


i 
LE 
iM 


> 


; 
; 
> 
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Name 0 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 





sraqx 
sraiqx 
stawx 
srawix 


srd 





MOTOROLA Appendix A. Instruction Set Listings A-7 


Name 0 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 


a1 
f 


fox) 
on 


is) 


_ — 


ine) 
are 
= 

” 


stfs 


stfsu 


pare 
as 


stfsux 


_ 


stfsx 


@ 
a 


> 
om 


sth 


oN 
b 


= 


sthbrx 
sthu 


> 
—+~ 1) oOo 


sthux 


sthx 


hh 
Nb 


stmw 


= 


stswi 


@ 
part 


stswXx 


i¢e) 
cop) 


stw 
stwbrx 


stwex. 


w 
— )} 


_ a 


subfcx 
subfex 


subfic 


(6) 


subfmex 


frS 

2 
a ae 
eae ae 
(pam Se ae ee 
ee oe ee 
ana (ae ae 
Lee | 


subfzex 


w 


‘ 
—_+~_ | 
; 
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MOTOROLA 


6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 
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A.2 PowerPC Instruction List Sorted by Opcode 


Table A-2 lists only the instructions defined in the PowerPC architecture that are 
implemented in the MPC601 in numeric order by opcode. It does not include the MPC601- 
only instructions implemented for POWER architecture compatibility. It also does not 
include the PowerPC instructions not implemented on the MPC601. 


Table A-2. PowerPC Instructions Implemented by MPC601: by Opcode 


Primary Extended F 
3 Trap Word Immediate 
Multiply Low Immediate 


| subfic | Subtract from Immediate Carrying 







Compare Logical Immediate 


Compare Immediate 


=. 
ins) 


Add Immediate Carrying 


_ 
w 


Add Immediate Carrying and Record 
Add Immediate 
Add Immediate Shifted 


fh 


[a 
fat 
eae 
[eer [Sh Conte 
a 
a 
a 


= 
co 


a]Jjof, of af oa oa eo ee a/7,of,oi/]/ wf a 4 — NI 
olor; o] oF © oo] of o oO; ni ao — 10 

w 

Lev) 


ao 
=k + 
fop) 


— 
o 
—_ 
N 
wo 


ol 
(=) 


[ener [ onitonRepte OR 
[een Coton Fea 


=k 
o 
=i 
co 
ive) 


[a7 [so [conan Rew on win Copeman 
[ae [aor [CmniontegseroR 
: 
[ [nie [Reta ones nan a ah Hose 
2 |____ [vn [Rost tt Wa mei nen ND wih ask | 











EAS 
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Table A-2. PowerPC Instructions Implemented by MPC601: by Opcode (Continued) 
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Table A-2. PowerPC Instructions Implemented by MPC601: by Opcode (Continued) 


Extended 
Opcode 
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Opcode 
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Table A-2. PowerPC Instructions Implemented by MPC601: by Opcode (Continued) 
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Extended 
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Opcode 
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Table A-2. PowerPC Instructions Implemented by MPC601: by Opcode (Continued) 


Primary Extended 
Opcode Opcode 
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Table A-2. PowerPC Instructions Implemented by MPC601: by Opcode (Continued) 


Opcode Opcode 

fresx Floating-Point Reciprocal Estimate Single-Precision 

59 nee fmulsx Floating-Point Multiply Single-Precision 

fs [28 | matiox | FoatrPom ati A Sg Pecin 
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Se ee Floating-Point Convert to Integer Word with Round Toward 
Zero 


fest te | tiv | Floating-PointDivide | Floating-PointDivide Divide 

a ea 
Zs 
eee! 
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fmaddx Floating-Point Multiply-Add 
fnmsubx Floating-Point Negative Multiply-Subtract 
fnmaddx Floating-Point Negative Multiply-Add 
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Table A-2. PowerPC Instructions Implemented by MPC601: by Opcode (Continued) 


Primary Extended 
Opcode Opcode 


Hfempo | Floating-Point Compare Ordered 










Move to FPSCR Field Immediate 
Floating-Point Negative Absolute Value 






Move to FPSCR Fields 
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Appendix B 
POWER Architecture Cross Reference 


This section identifies the incompatibilities that must be managed in migration from the 
POWER architecture to PowerPC architecture. Some of the incompatibilities can, at least 
in principle, be detected by the processor, which traps and lets software simulate the 
POWER operation. Others cannot be detected by the processor. 


In general, the incompatibilities identified here are those that affect a POWER application 
program. Incompatibilities for instructions that can be used only by POWER system 
programs are not discussed. Note that this section describes incompatibilities with respect 
to the PowerPC architecture in general. The MPC6()1 is more closely compatible with the 
POWER architecture. POWER instructions implemented in the MPC6()1 are listed in 
Table B-4. 


B.1 New Instructions, Formerly Privileged 
Instructions 


Instructions new to PowerPC typically use opcode values (including extended opcode) that 
are illegal in the POWER architecture. A few instructions that are privileged in the POWER 
architecture (for example, delz, called dcbz in the PowerPC architecture) have been made 
non-privileged in the PowerPC architecture. Any POWER program that executes one of 
these now-valid or now-non-privileged instructions, expecting to cause the system illegal 
instruction error handler (program exception) or the system privileged instruction error 
handler to be invoked, will not execute correctly on PowerPC processors. 


B.2 Newly Privileged Instructions 


The following instructions are user-level in the POWER architecture but are supervisor- 
level in PowerPC processors. 


¢ mfmsr 
¢ mfsr 


B.3 Reserved Bits in Instructions 


These are shown with '/'s in the instruction opcode definitions. In the POWER architecture 
such bits are ignored by the processor. In PowerPC architecture they must be 0 or the 
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instruction form is invalid. In several cases the PowerPC architecture assumes that such bits 
in POWER instructions are indeed (). The cases include the following: 

* cmpi, cmp, cmpli, and cmpl assume that bit 10 in the POWER instructions is 0). 

¢ mtspr and mfspr assume that bits 16-20 in the POWER instructions are (). 


B.4 Reserved Bits in Registers 


The POWER architecture defines these bits to be 0 when read, and either 0 or 1 when 
written to. In the PowerPC architecture it is implementation-dependent, for each register, 
whether these bits are 0 when read and ignored when written to or are copied from source 
to target when read or written to. 


B.5 Alignment Check 


The AL bit in the POWER machine-state register, MSR[24], is not supported in the 
PowerPC architecture. The bit is reserved in the PowerPC architecture. The low-order bits 
of the EA are always used. Notice that the value 0—the normal value for a reserved SPR 
bit—means “ignore the low-order EA bits” in the POWER architecture, and the value 1 
means “‘use the low-order EA bits.”” However, MSR[24] is not assigned new meaning in 
PowerPC. 


B.6 Condition Register 


The following instructions specify a field in the CR explicitly (via the BF field) and also 
have the record bit option. In the PowerPC architecture, if Rc=1 for these instructions the 
instruction form is invalid. In the POWER architecture, if Rc=1 the instructions execute 
normally except as shown in Table B-1. 


Table B-1. Condition Register Settings 
Ce oe 


CRO is undefined if Rc=1 and BF+0 
CRO is undefined if Rc=1 and BF#0 


CR1 is undefined if Re=1 
CR1 is undefined if Rc=1 and BF#1 


B.7 Inappropriate Use of LK and Rc bits 


For the instructions listed below, if LK=1 or Rc=1, POWER processors execute the 
instruction normally with the exception of setting the link register (if LK=1) or the 


|fempu CR1 is undefined if Ro=1 
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condition register field 0 or 1 (if Rc=1) to an undefined value. In the PowerPC architecture, 
such instruction forms are invalid. 


PowerPC instruction form is invalid if LK=1: 


* sc (sve in the POWER architecture) 

¢ Condition register logical instructions 
* merf 

¢ isync (ics in the POWER architecture) 


PowerPC instruction form is invalid if Rc=1: 


¢ Integer X-form load and store instructions 

¢ Integer X-form compare instructions 

¢ X-form trap instruction 

* mtspr, mfspr, mtcrf, mcrxr, mfcr, mtmsr, mfmsr, mtsr, mtsrin, tlbi, eciwx, 
ecowx, cles, mfsr, mfsrin, sync, eieio, icli 

* Floating-point X-form load and store instructions and floating-point compare 
instructions 

* merfs 

¢* dcbz (delz in the POWER architecture) 


B.8 BO Field 


The POWER architecture shows certain bits in the BO field—used by branch conditional 
instructions—as x without indicating how these bits are to be interpreted. These bits are 
ignored by POWER processors. The PowerPC architecture treats these bits differently, as 
shown in Table B-2. 


Table B-2. Differences in the BO Field 
BO[0-3] The PowerPC architecture shows the bits as z. If itis not cleared, the instruction form is 
invalid. 


BO[4] This bit, which is shown as x in the POWER architecture independent of the other four 
bits—is shown in the PowerPC architecture as y. It gives a hint about whether the 
branch is likely to be taken. If a POWER program has the wrong value for this bit, the 


program runs correctly but performance may suffer. 












B.9 Branch Conditional to Count Register 


For the case in which the count register is decremented and tested (that is, the case in which 
BO[2]=0), the POWER architecture specifies only that the branch target address is 
undefined, implying that the count register, and the link register (if LK=1), are updated in 
the normal way. The PowerPC architecture considers this instruction form invalid. 
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B.10 System Call/Supervisor Call 


The System Call (sc) instruction in the PowerPC architecture is called Supervisor Call 
(svcx) in the POWER architecture. Differences in implementations are as follows: 


¢ The POWER architecture provides a version of the Supervisor Call instruction (bit 
3() = 0) that allows instruction fetching to continue at any one of 128 locations. It is 
used for “fast SVCs.” The PowerPC architecture provides no such version 


¢ The POWER architecture provides a version of the Supervisor Call instruction (bits 
30-31 = b’11’) that resumes instruction fetching at one location and sets the link 
register to the address of the next instruction. The PowerPC architecture provides no 
such version; if bit 31 of the instruction is 1, the instruction form is invalid. 


¢ For the POWER architecture, information from the MSR is saved in the count 
register. For the PowerPC architecture, this information is saved in SRR1I. 


¢ The POWER architecture permits bits 16-29 of the Supervisor Call instruction to be 
non-zero, while in the PowerPC architecture, such an instruction form is invalid. 


¢ Bits 16-—29o0f the Supervisor Call instruction are regarded as reserved for the 
POWER architecture. As long as POWER compatibility is required for this 
instruction, bits 16-29 are ignored by the processor. . 

* The POWER architecture saves the low-order 16 bits of the Supervisor Call 
instruction in the count register; the PowerPC architecture does not save them. 


¢ The settings of the MSR bits by the system call exception differ between the 
POWER architecture and the PowerPC architecture. 


B.11 Update Forms of Memory Access 


The PowerPC architecture requires that rA not be equal to either rD (integer load only) or 
(). If the restriction is violated, the instruction form is invalid. See Appendix D, "Classes of 
Instructions,” for information about invalid instructions. The POWER architecture permits 
these cases and simply avoids saving the EA. 


B.12 Multiple Register Loads 


The PowerPC architecture requires that rA and rB if present in the instruction format, not 
be in the range of registers to be loaded, while the POWER architecture permits this and 
does not alter rA or rB in this case. (The PowerPC architecture restriction applies even if 
rA=(, although there is no obvious benefit to the restriction in this case since rA is not used 
to compute the effective address if rA=0.) If the PowerPC architecture restriction is 
violated, the instruction form is invalid. The instructions affected are listed as follows: 


¢ Imw (Im in the POWER architecture) 


¢ Iswi (Isi in the POWER architecture) 
¢ Iswx (Isx in the POWER architecture) 
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Thus, for example, an Imw instruction that loads all 32 registers is valid in the POWER 
architecture but is an invalid form in the PowerPC architecture. 


B.13 Alignment for Load/Store Multiple 


The PowerPC architecture requires the EA to be word-aligned and yields an alignment 
exception or boundedly undefined results if it is not. The POWER architecture specifies that 
an alignment exception occurs (if AL=1). 


B.14 Load String Instructions 


In the PowerPC architecture, an Iswx instruction with zero length leaves the content of rD 
undefined, while in the POWER architecture the corresponding instruction (Isx) does not 
alter rD. 


B.15 Synchronization 


The sync instruction (called dcs in the POWER architecture) causes a much more pervasive 
synchronization in the PowerPC architecture than in the POWER architecture. For more 
information, refer to Chapter 10), “Instruction Set.” 


B.16 Move to/from SPR 


Differences in how the Move to/from Special Purpose Register (mtspr and mfspr) 
instructions are as follows: 


* The SPR field is 10 bits long in the PowerPC architecture, but only 5 in POWER 
architecture. 


¢ The mfspr instruction can be used to read the decrementer (DEC) register in 
problem state (user) mode in the POWER architecture, but only in supervisor state 
in the PowerPC architecture. 


¢ Ifthe SPR value specified in the instruction is not one of the defined values, the 
PowerPC architecture considers the instruction form invalid. (In user mode, the 
allowed SPR values exclude those accessible only in supervisor mode.) The 
POWER architecture does not alter any architected registers in this case and 
generates a program exception if the instruction is executed in user mode and 
SPR[O]=1. 


For PowerPC processors except the MPC601 processor, a program exception is generated 
for an attempt to execute an mtspr or mfspr instruction with SPR[OQ—4]=0 (which denotes 
the MQ register). Similarly, a program exception is generated for attempts to execute an 
mfspr instruction with SPR[Q—4]=6 (which denotes reading the decrementer register in the 
POWER architecture). 
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B.17 Effects of Exceptions on FPSCR Bits FR and Fl 


For the following cases, the POWER architecture does not specify how the FR and FI bits 
are set, while the PowerPC architecture preserves them for illegal operation exceptions 
caused by compare instructions and clears them otherwise. 

¢ Invalid operation exception (enabled or disabled) 

¢ Zero divide exception (enabled or disabled) 

¢ Disabled overflow exception 


B.18 Floating-Point Store Instructions 


The POWER architecture uses FPSCR[UE] to help determine whether denormalization 
should be done, while the PowerPC architecture does not. Using FPSCR[UE] is in fact 
incorrect: in the PowerPC architecture if FPSCR[UE]=1 and a denormalized single- 
precision number is copied from one memory location to another by means of an Ifs 
instruction followed by an stfs instruction, the two “copies” may not be the same. 


B.19 Move from FPSCR 


The POWER architecture defines the high-order 32 bits of the result of mffs to be x’ FFFF 
FFFF’. In the PowerPC architecture they are undefined. 


B.20 Clearing Bytes in the Data Cache 


The delz instruction of the POWER architecture and the debz instruction of the PowerPC 
architecture have the same opcode. However, the functions differ in the following respects. 
¢ The delz instruction clears a line; dcbz clears a block (a sector in the MPC601). 
¢ The delz instruction saves the EA in rA (if rA40); dcbz does not. 

¢ The dcelz instruction is supervisor-level; dcbz is not. 


B.21 Segment Register Instructions 


The definitions of the four segment register instructions (mtsr, mtsrin, mfsr, and mfsrin) 
differ in two respects between the POWER architecture and the PowerPC architecture. 
Instructions similar to mtsrin and mfsrin are called mtsri and mfsri in the POWER 
architecture. 


Privilege—mfsr and mfsri are problem state instructions in the POWER architecture, 
while mfsr and mfsrin are privileged in the PowerPC architecture. 


Function—the indirect instructions (mtsri and mfsri) in the POWER architecture use an 
rA register in computing the segment register number, and the computed EA is stored into 
rA (if rA#0 and rA#rD); in the PowerPC architecture mtsrin and mfsrin have no rA field 
and EA is not stored. 
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The mtsr, mtsrin (mtsri), and mfsr instructions have the same opcodes in the PowerPC 
architecture as in the POWER architecture. The mfsri instruction in the POWER 
architecture and the mfsrin instruction in PowerPC architecture have different opcodes. 


B.22 TLB Entry Invalidation 


The tlbi instruction of the POWER architecture and the tlbie instruction of the PowerPC 
architecture have the same opcode. However, the functions differ in the following respects. 


¢ The tlbi instruction computes the EA as (rAlQ) + (rB), while tIbie lacks an rA field 
and computes the EA as (rB). 


¢ The tibi instruction saves the EA in rA (if rA40); tlbie lacks an rA field and does 
not save the EA. 


B.23 Timing Facilities 


This section describes differences between the POWER architecture and the PowerPC 
architecture timer facilities. 


B.23.1 Real-Time Clock 


The MPC601 implements a POWER-based RTC. Note that the POWER RTC is not 
supported in the PowerPC architecture. Instead, the PowerPC architecture provides a time 
base (TB). Both the RTC and the time base are 64-bit special purpose registers, but they 
differ in the following respects. 


¢ The RTC counts seconds and nanoseconds, while the TB counts “ticks.” The 
frequency of the RTC is implementation-dependent. 


¢ The RTC increments discontinuously—! is added to RTCU when the value in 
RTCL passes 999_999_999. The TB increments continuously—1 is added to TBU 
when the value in TBL passes x’ FFFF FFFP’. 

¢ The RTC is written and read by the mtspr and mfspr instructions, using SPR 
numbers that denote the RTCU and RTCD. The TB is written and read by new 
instructions (mttb, mttbu, mftb, and mftbu). 

¢ The SPR numbers that denote RTCL and RTCU are invalid in the PowerPC 
architecture except the MPC601. 

¢ The RTC is guaranteed to increment at least once in the time required to execute 10 
Add Immediate (addi) instructions. No analogous guarantee is made for the TB. 

¢ Not all bits of RTCL need be implemented, while all bits of the TB must be 
implemented. 


B.23.2 Decrementer 


The PowerPC architecture DEC register decrements at the same rate that the TB 
increments, while the POWER decrementers decrement every nanosecond (which is the 
same rate that the RTC increments). 
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Not all bits of the POWER DEC need be implemented, while all bits of the PowerPC DEC 
must be implemented. 


B.23.3 Deleted Instructions 
The following instructions are part of the POWER architecture but have been dropped from 


the PowerPC architecture. 
In MPC601 
Processor 


Table B-3. Deleted POWER Instructions 
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Table B-3. Deleted POWER Instructions (Continued) 


Sa eee 
Opcode =o Processor 
Shift | ShiftRight Extended == | ShiftRight Extended == 

Cae re em Ds 8 
faeg_[sitittenerdeswimua | | mv 
sig | Sitigitinmedaowmnia [sr [ee [ve 
eg | SitRgntonginmsawanw [av | ee [ve 
sig | sitrpntongwmwa [|e [ve 
Ca a A 2 
[sett | Siponeorcanwneneo ff [fm 


Note: Many of these instructions use the MQ register. The MQ is not defined in the PowerPC architecture, 
but is implemented in the MPC601 processor. 


B.24 POWER Instructions Supported by the MPC601 
Processor 
Table B-4 lists the POWER instructions implemented in the PowerPC architecture. 








Table B-4. POWER Instructions Implemented in PowerPC Architecture 
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— Immediate and [a Immediate Carrying 
Record and Record 
amefol.] Add to Minus One addme[o][.] a 
xtended 


azefol[ Add to Zero Extended | adazefolt. ——— 
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Table B-4. POWER Instructions Implemented in PowerPC Architecture (Continued) 


Count Leading Zeros Count a Zeros 
ee 
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Multiply-Add 


Floating Negative 
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| Load Multiple =| | Load Multiple =| fimw =| Load Multiple Word | | Load Multiple Word | Word 

Load String Immediate wow foe String Word 
Immediate 

| Load String Indexed | Load String Indexed Indexed fiswx | Load String Word Indexed | Load String Word Indexed | Word Indexed 






Load with Update som oe Word and Zero with 
Update 

Load with Update Indexed Load Word and Zero with 
Update Indexed 

Load Indexed Load Word and Zero 
Indexed 

Move to Segment Move to Segment 

ee Indirect Register Indirect * 

pmui | Mutiply Immediate jm Multiply Low Immediate 
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Table B-4. POWER Instructions Implemented in PowerPC Architecture (Continued) 


PowerPC 


vu 
(e) 
= 
m 
D 






Rotate Left Word 
Immediate then Mask 
Insert 


Rotate Left Immediate 
then Mask Insert 


Rotate Left Immediate 
then AND With Mask 
Rotate Left then AND with 
Mask 


sfe[o][.] Subtract from Extended 


4 Subtract from Immediate 
Subtract from Minus One 
Extended 
Subtract from Zero 
Extended 
Shift Right Algebraic 
Immediate 
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Store Byte-Reverse 
Indexed 


Store String Word 


Store String Immediate 
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Store String Indexed Store String Word 
Indexed 
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Rotate Left Word 
Immediate then AND with 
Mask 






Rotate Left Word then 
AND with Mask 





subfc[o][.] Subtract from Carrying 


subfe[o][.] 


Subtract from Immediate 
Carrying 









subfmef[o][.] 





subfze[o][.] 





Shift Left Word 
Shift Right Word 






Shift Right Algebraic Word 


Shift Right Algebraic Word 
Immediate 


Store Word 


Store Word Byte-Reverse 
Indexed 


Store Multiple Word 
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Table B-4. POWER Instructions Implemented in PowerPC Architecture (Continued) 


ee 














XOR Immediate Lower mor XOR Immediate 
XOR Immediate Upper Jxoris XOR Immediate Shifted 


* Supervisor-level instruction 
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Appendix C 
PowerPC Instructions Not Implemented 
in MPC601 


This appendix provides a list of 32-bit and 64-bit instructions that are not implemented by 
the MPC601, and that generate an illegal instruction exception. It also provides the 32-bit 
and 64-bit SPR encodings that are not implemented by the MPC60)1. 


See Table C-1 for a list of the 32-bit instructions not implemented by the MPC601. 
Table C-1. 32-Bit Instructions Not Implemented by the MPC601 


[ menonio | eseucton 
[ies Posing Pesan Sg Pedaon 
ice Posing Po Represal Sears ost Ese 
[sat [FestnpPomseet 
a cae cara 
rat 
sim 
ie 
tier 














Table C-2 provides a list of 32-bit SPR encodings that are not implemented by the 
MPC601. 
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Table C-2. 32-Bit SPR Encodings Not Implemented by the MPC601 


Register 
SPRI[5-9] | SPR[O-4] name 


01000 11100 


Supervisor 


01000 Supervisor 
10000 Supervisor 
10000 Supervisor 


10000 Supervisor 
10000 Supervisor 
10000 Supervisor 
10000 Supervisor 
10000 Supervisor 
10000 Supervisor 


Table C-3 provides a list of 64-bit instructions that are not implemented by the MPC601, 
and that generate an illegal instruction exception. 





Table C-3. 64-Bit Instructions Not Implemented by the MPC601 
entizd =| Count Leading Zeros Double Word 
Divide Double Word 


tefid | Floating Convert From Integer Double Word 
fetid =| Floating Convert to Integer Double Word 










fas [sestoniowoswinunsas 
Ca 
[ex [tandbaoieWecinees 
[me [seeWeaaebe 
[hao 
Tnx 
mais 









Load Word Algebraic with Update Indexed 
Load Word Algebraic Indexed 
Multiply Low Double Word 
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Table C-3. 64-Bit Instructions Not Implemented by the MPC601 (Continued) 


} Mnemonic | arco 
Ec 
muna [ My Hghtoae We Usgred | 
[rect [Rose ati oan Gears 
rider [Rass at bolo woth Cer —d 
rie [Potato ne Wasim an ear 
ise [Pos a il indi ten Ceara 


Rotate Left Double Word !mmediate then Clear Right 















Rotate Left Double Word Immediate then Mask Insert 
SLB Invalidate All 

SLB Invalidate Entry 

SLB Invalidate Entry by Index 

Shift Left Double Word 

Shift Right Algebraic Double Word 

Shift Right Algebraic Double Word Immediate 
Shift Right Double Word 

Store Double Word 

Store Double Word Conditional Indexed 
Store Double Word with Update 

Store Double Word Indexed with Update 
Store Double Word Indexed 

Trap Double Word 


Trap Double Word Immediate 





Table C-4 provides the 64-bit SPR encoding that is not implemented by the MPC6(0)1. 
Table C-4. 64-Bit SPR Encoding Not Implemented by the MPC601 


Register 
| Decimal | SPR[5-9] | SPR[O—4] Name 


01000 11000 ASR 
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centizd Not Implemented in MPC601 cntizd 





Count Leading Zeros Double Word Integer Unit 
cntlzd rA,rs (Rc=0) 
cntlzd. rA,rS (Rc=1) 
Reserved 
0 5 6 10 11 15 16 20 21 30 31 
NCO 


do while N<64 
if rS[NJ=1 then leave 
N¢N+1 

rA¢N 


A count of the number of consecutive zero bits starting at bit 0 of register rS is placed into 
rA. This number ranges from 0) to 64, inclusive. 


This instruction is defined only for 64-bit implementations. Using it on a 32-bit 
implementation will cause the system illegal instruction error handler to be invoked. 


Other registers altered: 
* Condition Register (CRO Field): 


Affected: LT, GT, EQ, SO (Rc=1) 

divd Not Implemented in MPC601 divd 
Divide Double Word Integer Unit 
divd rD,rA,rB (OE=0 Rc=0) 

divd. rD,rA,rB (OE=0 Rc=1) 

divdo rD,rA,rB (OE=1 Rc=0) 

divdo. rD,rA,rB (OE=1 Rc=1) 

Ee ee ee; 
0 5 6 10 11 15 16 20 21 22 30 31 


dividend[O-63]<—rA 
divisor[0-63]<-rB 
rDdividend+divisor 
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The 64-bit dividend is rA. The 64-bit divisor is rB. The 64-bit quotient of the dividend and 
divisor is placed into rD. The remainder is not supplied as a result. 


Both the dividend and the divisor are interpreted as signed integers. The quotient is the 
unique signed integer that satisfies 


dividend=(quotient*divisor)+r 


where () <r < Idivisorl if the dividend is nonnegative, and -Idivisorl < rs 0 if the dividend is 
negative. 


If an attempt is made to perform any of the divisions 


Ox8000_0000_0000_0000 = -1 
<anything> + () 


then the contents of rD are undefined. 


This instruction is defined only for 64-bit implementations. Using it on a 32-bit 
implementation will cause the system illegal instruction error handler to be invoked. 


Other registers altered: 
¢ Condition Register (CRO Field): 


Affected: LT, GT, EQ, SO (if Rc=1) 
* Exception Register: 

Affected: SO, OV (if OE=1) 
divdu Not Implemented in MPC601 divdu 
Divide Double Word Integer Unit 
divdu rD,rA,rB (OE=0 Rc=0) 
divdu. rD,rA,rB (OE=0 Rc=1) 
divduo rD,rA,rB (OE=1 Rc=0) 
divduo. rD,rA,rB (OE=1 Rc=1) 
ea eae an a ee ee 

10 11 15 16 20 21 22 30 31 


dividend[O—63]¢—rA 

divisor[0-63 ]<—-rB 

rD<dividend+tdivisor 
The 64-bit dividend is rA. The 64-bit divisor is rB. The 64-bit quotient of the dividend and 
divisor is placed into rD. The remainder is not supplied as a result. 
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Both the dividend and the divisor are interpreted as unsigned integers. The quotient is the 
unique unsigned integer that satisfies 


dividend=(quotient*divisor)+r 

where () <r < divisor. 

If an attempt is made to perform the division 
<anything> + 0 

then the contents of rD are undefined. 


This instruction is defined only for 64-bit implementations. Using it on a 32-bit 
implementation will cause the system illegal instruction error handler to be invoked. 


Other registers altered: 
* Condition Register (CRO Field): 





Affected: LT, GT, EQ, SO (if Rc=1) 
¢ Exception Register: 

Affected: SO, OV (if OE=1) 
extsw Not Implemented in MPC601 extsw 
Extend Sign Word 
extsw rA,rS (Rc=0) 
extsw. rA,rS (Rc=1) 

Reserved 

0 5 6 10 11 15 16 20 21 30 31 

s¢r8S[32] 


rA[32-63 }—rS[32-63] 
rA[0-31]¢-(32)s 


Register rS[32-63] are placed into rA[32-63]. Bit 32 of rS is placed into rA[0-31]. — 


This instruction is defined only for 64-bit implementations. Using it on a 32-bit 
implementation will cause the system illegal instruction error handler to be invoked. 
Other registers altered: 
* Condition Register (CRO Field): 
Affected: LT, GT, EQ, SO (if Rc=1) 
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fcfid Not Implemented in MPC601 fcfid 


Floating Convert from Integer Double Word Floating-Point Unit 
fcfid frD,frB (Rc=0) 
fcfid. frD,frB (Rc=1) 


Reserved 





0 > 6 10 114 15 16 20 21 30 31 


The 64-bit signed fixed-point operand in register frB is converted to an infinitely precise 
floating-point integer. If the result of the conversion is already in double-precision range it 
is placed into register frD. Otherwise the result of the conversion is rounded to double- 
precision using the rounding mode specified by FPSCR[RN] and placed into register frD. 


FPSCR[FPRF] is set to the class and sign of the result. FPSCR[FR] is set if the result is 
incremented when rounded. FPSCR[FI] is set if the result is inexact. 


This instruction is defined only for 64-bit implementations. Using it on a 32-bit 
implementation will cause the system illegal instruction error handler to be invoked. 
Other registers altered: 
¢ Condition Register (CR1 Field): 
Affected: LT, GT, EQ, SO (if Rc=1) 
¢ Exception Register: 
Affected: FPRF, FR, FI, FX, XX 


fctid Not Implemented in MPC601 fctid 
Floating Convert to Integer Double Word 

fetid frD,frB (Rc=0) 

fetid. frD,frB (Rc=1) 


Reserved 





0 5 6 10 11 15 16 20 21 30 31 


The floating-point operand in frB is converted to a 64-bit signed fixed-point integer, using 
the rounding mode specified by FPSCR[RN], and placed into frD. 


If the operand in frB is greater than 2(63)-1, then frD is set to Ox7FFF_FFFF_FFFF_FFFF. 
If the operand in frB is less than -2(63), then frD is set to Ox8000_0000_0000_0000. 
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Except for enabled invalid operation exceptions, FPSCR[FPRF] is undefined. FRSCR[FR] 
is set if the result is incremented when rounded. FPSCR[FI] is set if the result is inexact. 


This instruction is defined only for 64-bit implementations. Using it on a 32-bit 
implementation will cause the system illegal instruction error handler to be invoked. 
Other registers altered: 
* Condition Register (CR1 Field): 
Affected: LT, GT, EQ, SO (if Rc=1) 
¢ Exception Register: 
Affected: FPRF (undefined), FR, FI, FX, XX, VXSNAN VXCVI 


fctidz Not Implemented in MPC601 fctidz 
Floating Convert to Integer Double Word 

fctidz frD,frB (Rc=0) 

fctidz. frD,frB (Rc=1) 


Reserved 





The floating-point operand in frB is converted to a 64-bit signed fixed-point integer, using 
the rounding mode round toward zero, and placed into frD. 


If the operand in frB is greater than 2(63)-1, then frD is set to 0x7FFF_FFFF_FFFF_FFFF. 
If the operand in frB is less than -2(63), then frD is set to Ox8000_0000_0000_0000. 


Except for enabled invalid operation exceptions, FPSCR[FPRF] is undefined. FPSCR[FR] 
is set if the result is incremented when rounded. FPSCR[FI] is set if the result is inexact. 


This instruction is defined only for 64-bit implementations. Using it on a 32-bit 
implementation will cause the system illegal instruction error handler to be invoked. 
Other registers altered: 
¢ Condition Register (CR1 Field): 
Affected: LT, GT, EQ, SO (if Rc=1) 
¢ Exception Register: 
Affected: FPRF (undefined), FR, FI, FX, XX, VXSNAN VXCVI 
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fresx Not Implemented in MPC601 fresx 


Floating-Point Reciprocal Estimate Single-Precision Floating-Point Unit 
fres frD,frB (Rc=0) 
fres. frD,frB (Rc=1) 

Reserved 






5 


This PowerPC instruction is not implemented by the MPC601. Execution of this instruction 
will invoke the illegal instruction handler. A description of the operation of this instruction 
is provided for emulation purposes. 


A single-precision estimate of the reciprocal of the floating-point operand in register frB is 
placed into register frD. The estimate placed into register frD is correct to a precision of 
one part in 256 of the reciprocal of frB. 


Operation with various special values of the operand is summarized below. 


Operand Result Exception 
-00 -0 None 

-0 -00%* ZX 

+0 oo* ZX 

oo +0 None 
SNaN QNaN** VXSNAN 
QNaN QNaN None 


* No result if FPSCR[ZE]=1. 
** No result if FPSCR[VEJ=1. 


FPSCR[FPRF] is set to the class and sign of the result, except for invalid operation 
exceptions when FPSCR[VE]=1 and zero divide exceptions when FPSCR[ZE]=1. 
Other registers altered: 
* Condition Register (CR1 Field): 
Affected: FX, FEX, VX, OX (if Rc=1) 
¢ Floating-point Status and Control Register: 
Affected: FX, OX, UX, ZX, VXSNAN, FPRE, FR (undefined), FI (undefined) 
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frsqrte Not Implemented in MPC601 frsqrte 


Floating-Point Reciprocal Square Root Estimate Floating-Point Unit 
frsqrte frD,frB (Rc=(0) 
frsqrte. frD,frB (Rc=1) 

Reserved 





0 5 6 10 11 15 16 20 21 25 26 30 31 


This PowerPC instruction is not implemented by the MPC601. Execution of this instruction 
will invoke the illegal instruction handler. A description of the operation of this instruction 
is provided for emulation purposes. 


A double-precision estimate of the reciprocal of the square root of the floating-point 
operand in register frB is placed into register frD. The estimate placed into register frD is 
correct to a precision of one part in 32 of the reciprocal of the square root of frB. 


Operation with various special values of the operand is summarized below. 


Operand Result Exception 
~00 QNaN** VXSQRT 
<0 QNaN** VXSQRT 
-0 -00* ZX 

+0 oo* ZX 

-fo0 +0 None 
SNaN QNaN** VXSNAN 
QNaN QNaN None 


* No result if FPSCR[ZE]=1. 
** No result if FPSCR[VE]=1. 


FPSCR[FPRF] is set to the class and sin of the result, except for invalid operation 
exceptions when FPSCR[VE]=1 and zero divide exceptions when FPSCR[ZE]=1. 
Other registers altered: 
¢ Condition Register (CR1 Field): 
Affected: FX, FEX, VX, OX (if Rc=1) 
¢ Floating-point Status and Control Register: 
Affected: FX, OX, UX, ZX, VXSNAN, FPRF, FR (undefined), FI (undefined) 
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fselx Not Implemented in MPC601 fselx 
Floating-Point Select Floating-Point Unit 


fsel frD,frA,frC,frB (Rc=0) 
fsel. frD,frA,frC,frB (Rc=1) 


10 11 15 16 20 21 25 26 30 31 


if (frA) 2 0.0 then frD<-(frC) 

else frD<—(frB) 
This PowerPC instruction is not implemented by the MPC601. Execution of this instruction 
will invoke the illegal instruction handler. A description of the operation of this instruction 
is provided for emulation purposes. 


The floating-point operand in register frA is compared to the value zero. If the operand is 
greater than or equal to zero, register frD is set to the contents of register frC. If the operand 
is less than zero or is a NaN, register frD is set to the contents of register frB. The 
comparison ignores the sign of zero (i.e., regards +0 as equal to -()). 
Other registers altered: 

¢ Condition Register (CR1 Field): 

Affected: FX, FEX, VX, OX (if Rc=1) 

Care must be taken in using fsel if IEEE compatibility is required, or if the values being 
tested can be NaNs or infinities. 


fsq rtx Not Implemented in MPC601 fsq rtx 
Floating-Point Square Root [Single-Precision] Floating-Point Unit 
fsqrt frD,frB (Rc=(0) 
fsqrt. frD,frB (Rc=1) 

Reserved 
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fsqrts frD,frB (Rc=0) 
fsqrts. | frD,frB (Rc=1) 


Reserved 





20 





16 


This PowerPC instruction is not implemented by the MPC601. Execution of this instruction 
will invoke the illegal instruction handler. A description of the operation of this instruction 
is provided for emulation purposes. 


The square root of the floating-point operand in register frB is placed into register frD. 


If the most significant bit of the resultant significand is not a one the result is normalized. 
The result is rounded to the target precision under control of the floating-point rounding 
control field RN of the FPSCR and placed into register frD. 


Operation with various special values of the operand is summarized below. 


Operand Result Exception 
~00 QNaN* VXSQRT 
<0 QNaN* VXSQRT 
-0) -0 None 
+oo foo None 
SNaN QNaN* VXSNAN 
QNaN QNaN None 


* No result if FPSCR[VE]=1. 
FPSCR[FPRF] is set to the class and sign of the result, except for invalid operation 
exceptions when FPSCR[VE]=1. 
Other registers altered: 
¢ Condition Register (CR1 Field): 
Affected: FX, FEX, VX, OX (if Rc=1) 
¢ Floating-point Status and Control Register: 
Affected: FX, XX, VXSQRT, VXSNAN, FPRF, FR, FI 
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Id Not Implemented in MPC601 Id 
Load Double Word Integer Unit 


Id rD,ds(rA) 






0 5 6 10 11 15 16 29 30 31 


if rA=0 then bé—-0 

else berA 

EA—-b+EXTS(dsll0b00) 

rD#+MEMC(EA, 8) 
EA is the sum (rAl0)+(dsllOb00). The double word in storage addressed by EA is loaded 
into rD. 


This instruction is defined only for 64-bit implementations. Using it on a 32-bit 
implementation will cause the system illegal instruction error handler to be invoked. 


Other registers altered: 


None 

Idarx Not Implemented in MPC601 idarx 
Load Double Word and Reserve Indexed Integer Unit 
Idarx rD,rA,rB 


Reserved 





if rA=0 then b¢—0 

else berA 
EA<—b+rB 

RESERVE<-1 
RESERVE_ADDR¢<func(EA) 
rD<-MEMCEA, 8) 


EA is the sum (rAlQ)+(rB). The double word in storage addressed by EA is loaded into rD. 
This instruction creates a reservation for use by a store double word conditional instruction. 


An address computed from the EA is associated with the reservation, and replaces any 
address previously associated with the reservation. 
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EA must be a multiple of 8. If it is not, the system alignment error handler may be invoked 
or the results may be boundedly undefined. 


This instruction is defined only for 64-bit implementations. Using it on a 32-bit 
implementation will cause the system illegal instruction error handler to be invoked. 


Other registers altered: 


None 

Idu Not Implemented in MPC601 Idu 
Load Double Word with Update Integer Unit 
Idu rD,ds(rA) 

eae ae ae ae ae ee. ee ee) 
0 5 6 10 11 15 16 29 30 31 


EA€rA+EXTS(dsll0b00) 
rD¢-MEMCEA, 8) 
rA<-EA 


EA is the sum (rA)+(dsllOb00). The double word in storage addressed by EA is loaded into 
rD. 


EA is placed into rA. 
If rA=0 or rA=rD, the instruction form is invalid. 


This instruction is defined only for 64-bit implementations. Using it on a 32-bit 
implementation will cause the system illegal instruction error handler to be invoked. 


Other registers altered: 


None 
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Idux Not Implemented in MPC601 Ildux 





Load Double Word with Update Indexed Integer Unit 
Idux rD,rA,rB 
Reserved 
0 5 6 10 11 15 16 20 21 30 31 
EA<-rA+rB 
rD<-MEM(EA, 8) 
rA<-EA 


EA is the sum (rA)+(rB). The double word in storage addressed by EA is loaded into rD. 
EA is placed into rA. 
If rA=0 or rA=rD, the instruction form is invalid. 


This instruction is defined only for 64-bit implementations. Using it on a 32-bit 
implementation will cause the system illegal instruction to be invoked. 


Idx Not Implemented in MPC601 Idx 
Load Double Word Indexed 


Idx rD,rA,rB 


Reserved 





0 5 6 10 11 15 16 20 21 30 31 


if rA= 0 then b¢—-0 
else berA 
EA¢-b+rB 
rD<MEMCEEA, 8) 


EA is the sum (rAl0)+(rB). The double word in storage addressed by EA is loaded into rD. 


This instruction is defined only for 64-bit implementations. Using ‘t on a 32-bit 
implementation will cause the system illegal instruction error handler to be invoked. 


Other registers altered: 


None 
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Ilwa Not Implemented in MPC601 lwa 


Load Word Algebraic Integer Unit 
lwa rD,ds(rA) 

Ss ae a a ee eee Fa 
0 5 6 10 11 15 16 29 30 31 


if rA=0 then b¢—0 

else berA 
EA€—b+EXTS(dsll0b00) 
rD¢<-EXTS(MEMCEA, 4)) 


EA is the sum (rAl0)+(ds!lI0b00). The word in storage addressed by EA is loaded into 
rD[32-63]. Register rD[0-31] are filled with a copy of bit 0 of the loaded word. 


This instruction is defined only for 64-bit implementations. Using it on a 32-bit 
implementation will cause the system illegal instruction error handler to be invoked. 


Other registers altered: 





None 
lwaux Not Implemented in MPC601 lwaux 
Load Word Algebraic with Update Indexed 
Iwaux rD,rA,rB 
Reserved 

0 5 6 10 11 15 16 20 21 30 31 

EA<-rA+rB 

rD<-EXTS(MEM(EA, 4)) 

rA¢-EA 


EA is the sum (rA)+(rB). The word in storage addressed by EA is loaded into rD[32-63]. 
Register rD[0-31] are filled with a copy of bit 0 of the loaded word. 


EA is placed into rA. 


If rA=0 or rA=rD, the instruction form is invalid. 
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This instruction is defined only for 64-bit implementations. Using it on a 32-bit 
implementation will cause the system illegal instruction error handler to be invoked. 


Other registers altered: 


None 


lwax Not Implemented in MPC601 lwax 
Load Word Algebraic Indexed 


lwax rD,rA,rB 


Reserved 





0 5 6 10 11 15 16 20 21 30 31 


if rA=0 then b¢-0 

else berA 
EA¢—-b+rB 
rD<EXTS(MEMC(EA, 4)) 


EA is the sum (rAl0)+(rB). The word in storage addressed by EA is loaded into rD[32-63]. 
Register rD[0-31] are filled with a copy of bit 0 of the loaded word. 


This instruction is defined only for 64-bit implementations. Using it on a 32-bit 
implementation will cause the system illegal instruction error handler to be invoked. 


Other registers altered: 


None 

mftb Not Implemented in MPC601 mitb 
Move from Time Base Integer Unit 
mftb rD,TBR 


Reserved 





N<TBR[5-9] Il TBR[O-4] 
if N=268 then 
if (64-bit implementation) then 
rD<TB 
else 
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rD — TB[32-63] 
else if N=269 then 
if (64-bit implementation) then 
rD@(G2)0 Il TB[O-31] 
else 


rD€-TB[0-31] 


The TBR field denotes either the time base or time base upper, encoded as shown in 
Table C-5. The contents of the designated register are copied to rD. When reading Time 
Base Upper on a 64-bit implementation, the high-order 32 bits of rD are set to zero. 


Table C-5. TBR Encodings for mftb 


TBR* Register 
TBR[5-9] TBR[0-4] Name 


*Note that the order of the two 5-bit halves of the TBR number is 
reversed. 






If the TBR field contains any value other than one of the values shown in Table C-5, the 
instruction form is invalid. 


Other registers altered: 


None 

mulhd Not Implemented in MPC601 mulhd 
Multiply High Double Word Integer Unit 
mulhd rD,rA,rB (Rc=0) 

mulhd. rD,rA,rB (Rc=1) 





0 S-6 10 11 15 16 20 21 22 30 31 


prod[0-—127]<—-rA*rB 
rD<—prod[0-63] 


The 64-bit multiplicands are rA and rB. The high-order 64 bits of the 128-bit product of 
the multiplicands are placed into rD. 


Both the multiplicands and the product are interpreted as signed integers. 


This instruction is defined only for 64-bit implementations. Using it on a 32-bit 
implementation will cause the system illegal instruction error handler to be invoked. 
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Other registers altered: 
¢ Condition Register (CRO Field): 


Affected: LT, GT, EQ, SO (if Rc=1) 
mulhdu _ Not impiemented in MPcé01 mulhdu 
Multiply High Double Word Unsigned Integer Unit 
mulhdu rD,rA,rB (Rc=() 
mulhdu. rD,rA,rB (Rc=1) 





0 5 6 10 11 15 16 20 21 22 30 31 


prod[0-127)<—-rA*rB 
rD<prod[0-63] 


The 64-bit multiplicands are rA and rB. The high-order 64 bits of the 128-bit product of 
the multiplicands are placed into rD. 


Both the multiplicands and the product are interpreted as unsigned integers. 


This instruction is defined only for 64-bit implementations. Using it on a 32-bit 
implementation will cause the system illegal instruction error handler to be invoked. 


Other registers altered: 
¢ Condition Register (CRO Field): 


Affected: LT, GT, EQ, SO (if Rc=1) 
mulld Not Implemented in MPC601 mulid 
Multiply Low Double Word Integer Unit 
mulld rD,rA,rB (OE=0 Rc=0) 
mulld. rD,rA,rB (OE=0 Rc=1) 
mulldo rD,rA,rB (OE=1 Rc=0) 
mulldo. rDrA,rB  (OE=!1 Rc=1) 
10 11 15 16 20 21 22 30 31 
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prod[0-127]<-rA*rB 
rD<prod[64—127] 


The 64-bit operands are rA and rB. The low-order 64 bits of the 128-bit product of the 
operands are placed into rD. 


If OE=1, then SO and OV are set to one if the product cannot be represented in 64 bits. 


Both the operands and the product are interpreted as signed integers. However, the result in 
rD is independent of whether the operands are interpreted as signed or unsigned integers. 


This instruction is defined only for 64-bit implementations. Using it on a 32-bit 
implementation will cause the system illegal instruction error handler to be invoked. 


Other registers altered: 
* Condition Register (CRO Field): 


Affected: LT, GT, EQ, SO (if Rc=1) 
¢ Exception Register: 

Affected: SO OV (if OE=1) 
ridcl Not Implemented in MPC601 ridcl 
Rotate Left Double Word then Clear Left Integer Unit 
ridcl rA,rS,rB,MB (Rc=0) 
ridcl. rA,rS,rB,MB (Rc=1) 

10 11 15 16 20 21 26 27 30 31 
N€rB[58-63] 


r—ROTL[64](rS, N) 
b¢é—-MBJ5] Il MB[O-4] 
m€—-MASK(b, 63) 
rA€—r &m 


The contents of rS are rotated[64] left the number of bits specified by rB[58-63]. A mask 
is generated having 1-bits from bit MB through bit 63 and ()-bits elsewhere. The rotated 
data is ANDed with the generated mask and the result is placed into rA. 


This instruction is defined only for 64-bit implementations. Using it on a 32-bit 
implementation will cause the system illegal instruction error handler to be invoked. 
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Other registers altered: 
¢ Condition Register (CRO Field): 


Affected: LT, GT, EQ, SO (if Rc=1) 
rider Not Implemented in MPC601 rildcr 
Rotate Left Double Word then Clear Right Integer Unit 
rider rA,rS,rB,ME (Rc=0) 
rider. rA,rS,rB,ME (Rc=1) 
ee ee 
20 21 26 27 30 31 
N€rB[58-63] 


tROTL[64](rS, N) 
eME[S] ll ME{0-4] 
m€—-MASK(0, e) 
rA€qr & m 


The contents of rS are rotated[64] left the number of bits specified by rB[S58—63]. A mask 
is generated having 1-bits from bit 0 through bit ME and ()-bits elsewhere. The rotated data 
is ANDed with the generated mask and the result is placed into rA. 


This instruction is defined only for 64-bit implementations. Using it on a 32-bit 
implementation will cause the system illegal instruction error handler to be invoked. 


Other registers altered: 
* Condition Register (CRO Field): 


Affected: LT, GT, EQ, SO (if Rc=1) 
ridic Not Implemented in MPC601 ridic 
Rotate Left Double Word Immediate then Clear Integer Unit 
ridic rA,rS,SH,MB (Rc=0) 
ridic. rA,rS,SH,MB (Rc=1) 
ee eo ee 
10 11 20 21 26 27 29 30 31 


N€SH[5] |! SH[0-4] 
r-ROTL[64](rS, N) 
b<—-MB{S] Il MB[0-4] 
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m€—-MASK(b, -N) 
rA€—r & m 


The contents of rS are rotated[64] left SH bits. A mask is generated having 1-bits from bit 
MB through bit 63-SH and 0-bits elsewhere. The rotated data is ANDed with the generated 
mask and the result is placed into rA. 


This instruction is defined only for 64-bit implementations. Using it on a 32-bit 
implementation will cause the system illegal instruction error handler to be invoked. 


Other registers altered: 
* Condition Register (CRO Field): 


Affected: LT, GT, EQ, SO (if Rce=1) 
ridicl Not Implemented in MPC601 ridicl 
Rotate Left Double Word Immediate then Clear Left Integer Unit 
ridicl rA,rS,SH,MB (Rc=0) 
ridicl. rA,rS,SH,MB (Rc=1) 
a Ea Ee a ee 
10 11 15 16 20 21 26 27 29 30 31 


N¢SH[(S] ll SH[0-4] 
r¢-ROTL[64](rS, N) 
b¢—-MBjS] II MB[0-4] 
m¢—-MASK(b, 63) 
rAé—r &m 


The contents of rS are rotated[64] left SH bits. A mask is generated having 1-bits from bit 
MB through bit 63 and 0-bits elsewhere. The rotated data is ANDed with the generated 
mask and the result is placed into rA. 


This instruction is defined only for 64-bit implementations. Using it on a 32-bit 
implementation will cause the system illegal instruction error handler to be invoked. 
Other registers altered: 
¢ Condition Register (CRO Field): 
Affected: LT, GT, EQ, SO (if Rc=1) 
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ridicr Not Implemented in MPC601 ridicr 


Rotate Left Double Word Immediate then Clear Right Integer Unit 

ridicr rA,rS,SH,ME (Rc=0) 

ridicr. rA,rS,SH,ME (Rc=1) 

[ae ee ee ee a es ee 
10 11 15 16 20 21 26 27 29 30 31 


N€SH{S] ll SH[O-4] 
r¢-ROTL[64](rS, N) 
e—ME/[5] I! ME[0-4] 
m€—-MASK(0, e) 
rA¢r &m 


The contents of rS are rotated[64] left SH bits. A mask is generated having 1-bits from bit 
ME and ()-bits elsewhere. The rotated data is ANDed with the generated mask and the result 
is placed into rA. 


This instruction is defined only for 64-bit implementations. Using it on a 32-bit 
implementation will cause the system illegal instruction error handler to be invoked. 


Other registers altered: 
¢ Condition Register (CRO Field): 


Affected: LT, GT, EQ, SO (if Rc=1) 
ridimi Not Implemented in MPC601 rildimi 
Rotate Left Double Word Immediate then Clear Left Integer Unit 
ridimi rA,rS,SH,MB (Re=0) 
ridimi. rA,rS,SH,MB (Rc=1) 
Ee ea A eae AR ee 2 eee 
10 11 15 16 20 21 26 27 29 30 31 


N¢SH{[5] ll SH[O-4] 
t-ROTL[64](rS, N) 
b¢-MB[5] !l MB[0-4] 
m¢—-MASK(b, -N) 
rA<(r & m) 1 (rA & -m) 
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The contents of rS are rotated[64] left SH bits. A mask is generated having 1-bits from bit 
MB through bit 63-SH and 0-bits elsewhere. The rotated data is inserted into rA under 
control of the generated mask. 


This instruction is defined only for 64-bit implementations. Using it on a 32-bit 
implementation will cause the system illegal instruction error handler to be invoked. 


Other registers altered: 
* Condition Register (CRO Field): 
Affected: LT, GT, EQ, SO (if Rc=1) 


sibia Not Implemented in MPC601 sibia 
SLB Invalidate All 


Reserved 





0 5 6 10 11 15 16 20 21 30 31 


All SLB entries¢—invalid 


This PowerPC instruction is not implemented by the MPC601. Execution of this instruction 
will invoke the illegal instruction handler. A description of the operation of this instruction 
is provided for emulation purposes. 


The SLB is invalidated regardless of the settings of MSR[IR] and MSR[DR]. 
This instruction is supervisor-level. 
This instruction is optional in PowerPC architecture. 


This instruction is defined only for 64-bit implementations. Using it on a 32-bit 
implementation will cause an illegal instruction type program interrupt. 


It is not necessary that the ASR point to a valid segment table when issuing slbia. 


Other registers altered: 


None 
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sibie Not Implemented in MPC601 sibie 
SLB Invalidate Entry 


slbie rB 


Reserved 





0 5 6 10 11 15 16 20 21 30 31 


EA¢€-(rB) 
if SLB entry exists for EA, then 
SLB entry¢—invalid 


This PowerPC instruction is not implemented by the MPC601. Execution of this instruction 
will invoke the illegal instruction handler. A description of the operation of this instruction 
is provided for emulation purposes. 


EA is the contents of rB. If the segment lookaside buffer (SLB) contains an entry 
corresponding to EA, that entry is made invalid (i.e., removed from the SLB). 


The SLB search is done regardless of the settings of MSR[IR] and MSR[DR]. 
Block address translation for EA, if any, is ignored. 

This instruction is supervisor-level. 

This instruction is optional in PowerPC architecture. 


This instruction is defined only for 64-bit implementations. Using it on a 32-bit 
implementation will cause an illegal instruction type program interrupt. 


Other registers altered: 
None 


It is not necessary that the ASR point to a valid segment table when issuing slbie. 
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silbiex Not Implemented in MPC601 sibiex 
SLB Invalidate Entry by Index 


slbiex rB 


Reserved 





0 5 6 10 11 15 16 20 21 30 31 


N(rB) 
SLB entry N¢-invalid 


This PowerPC instruction is not implemented by the MPC601. Execution of this instruction 
will invoke the illegal instruction handler. A description of the operation of this instruction 
is provided for emulation purposes. 


The SLB entry is invalidated regardless of the settings of MSR[IR] and MSR[DR]. 
This instruction is supervisor-level. 
This instruction is optional in PowerPC architecture. 


This instruction is defined only for 64-bit implementations. Using it on a 32-bit 
implementation will cause an illegal instruction type program interrupt. 


Other registers altered: 


None 

sid Not Implemented in MPC601 sid 

Shift Left Double Word Integer Unit 

sld rA,rS,rB (Rce=0) 

sid. rA,rS,rB (Rc=1) 

SEE a (a ee a Ee eee eee LS 
10 11 15 16 20 21 30 31 

NerB[58-63] 


r—ROTL[64](rS, N) 

if rB[57]=0 then 
m<—MASK(0, 63-N) 

else m¢—(64)0 

rA¢-r & m 
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The contents of rS are shifted left the number of bits specified by rB[57—63]. Bits shifted 
out of position 0 are lost. Zeros are supplied to the vacated positions on the right. The result 
is placed into rA. Shift amounts from 64 to 127 give a zero result. 


This instruction is defined only for 64-bit implementations. Using it on a 32-bit 
implementation will cause the system illegal instruction error handler to be invoked. 


Other registers altered: 
¢« Condition Register (CRO Field): 


Affected: LT, GT, EQ, SO (if Rc=1) 
srad Not Implemented in MPC601 srad 
Shift Right Algebraic Double Word Integer Unit 
srad rA,rS,rB (Rc=0) 
srad. rA,rS,rB (Rc=1) 


S 
0 6 10 11 15 16 20 21 30 31 


N€rB[58-63] 

r<-ROTL{64](rS, 64-N) 

if rB{[S7]=0 then 

m€-MASK\N, 63) 

else m¢—(64)0 

s¢-rS[0] 

rA€e(r & m)! (((64)s) & =n) 

CAés & ((r&—1n)#0) 
The contents of rS are shifted right the number of bits specified by rB[57-63]. Bits shifted 
out of position 63 are lost. Bit 0 of rS is replicated to fill the vacated positions on the left. 
The result is placed into rA. CA is set to 1 if rS is negative and any 1-bits are shifted out 
of position 63; otherwise CA is set to 0. A shift amount of zero causes rA to be set equal to 
rS, and CA to be set to 0. Shift amounts from 64 to 127 give a result of 64 sign bits in rA, 
and cause CA to receive the sign bit of rS. 


This instruction is defined only for 64-bit implementations. Using it on a 32-bit 
implementation will cause the system illegal instruction error handler to be invoked. 
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5 


Other registers altered: 
* Condition Register (CRO Field): 
Affected: LT, GT, EQ, SO (if Rc=1) 
¢ Exception Register: 
Affected: CA 


sradi Not Implemented in MPC601 sradi 

Shift Right Algebraic Double Word Immediate Integer Unit 

sradi rA,rS,SH (Rc=0) 

sradi. rA,rS,SH (Rc=1) 

Be ee a nce ca ese ee 
10 11 15 16 20 21 30 31 


N¢€SH{[5] Il SH[O-4] 

r-ROTL[64](rS, 64-N) 

m¢-MASK\N, 63) 

s€rS[0] 

rA(r & m) | (((64)s) & +n) 

CA¢s & ((r&—m) #0) 
The contents of rS are shifted right SH bits. Bits shifted out of position 63 are lost. Bit 0 of 
rs is replicated to fill the vacated positions on the left. The result is placed into rA. CA is 
set to 1 if rS is negative and any 1-bits are shifted out of position 63; otherwise CA is set 
to (). A shift amount of zero causes rA to be set equal to rS, and CA to be set to 0). 


This instruction is defined only for 64-bit implementations. Using it on a 32-bit 
implementation will cause the system illegal instruction error handler to be invoked. 
Other registers altered: 
¢ Condition Register (CRO Field): 
Affected: LT, GT, EQ, SO (if Rc=1) 
¢ Exception Register: 
Affected: CA 
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srd Not Implemented in MPC601 srd 


Shift Right Double Word Integer Unit 
srd rA,rS,rB (Rc=0) 
srd. rA,rs,rB (Rc=1) 
eee ee 
10 11 15 16 20 21 30 31 
N€rB[58-63] 


1¢-ROTL[64](rS, 64-N) 
if rB[S7]=0 then 
m¢—-MASKV(N, 63) 

else m¢€—(64)0 

rA¢-+r & m 
The contents of rS are shifted right the number of bits specified by rB[57-63]. Bits shifted 
out of position 63 are lost. Zeros are supplied to the vacated positions on the left. The result 
is placed into rA. Shift amounts from 64 to 127 give a zero result. 


This instruction is defined only for 64-bit implementations. Using it on a 32-bit 
implementation will cause the system illegal instruction error handler to be invoked. 


Other registers altered: 
¢« Condition Register (CRO Field): 


Affected: LT, GT, EQ, SO (if Rc=1) 
std Not Implemented in MPC601 std 
Store Double Word Integer Unit 
std rS,ds(rA) 
a ae ae ee el Se 7 
0 5 6 10 11 15 16 29 30 31 


if rA=0 then b¢—0 

else berA 

EA@b + EXTS(dsll0b00) 

(MEM(BA, 8))<—rS 
EA is the sum (rAlQ)+(dsllOb00). Register rS is stored into the double word in storage 
addressed by EA. 
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This instruction is defined only for 64-bit implementations. Using it on a 32-bit 
implementation will cause the system illegal instruction error handler to be invoked. 


Other registers altered: 


None 

stdcx. Not Implemented in MPC601 stdcx. 
Store Double Word Indexed Integer Unit 
stdex. rs,rA,rB 

0 5 6 10 11 15 16 20 21 30 31 


if rA=0 then b¢-0 
else be-rA 
EA¢—b+rB 
if RESERVE then 
(MEMC(EA, 8))<—rS 
RESERVE<-0 
CRO<0b00 Il Ob! Il XER[SO] 
else 


CRO€-0b00 II Ob0 Il XER[SO] 
EA is the sum (rAI0)+(rB). 


If a reservation exists, rS is stored into the double word in storage addressed by EA and the 
reservation is cleared. 


If a reservation does not exist, the instruction completes without altering storage. 


CRO Field is set to reflect whether the store operation was performed (i.e., whether a 
reservation existed when the stdcx. instruction commenced execution), as follows: 


CRO[LT GT EQ SO] + 0600 II store_performed |] XER[SO] 


EA must be a multiple of 8. If it is not, the system alignment error handler may be invoked 
or the results may be boundedly undefined. 


This instruction is defined only for 64-bit implementations. Using it on a 32-bit 
implementation will cause the system illegal instruction error handler to be invoked. 
Other registers altered: 
* Condition Register (CRO Field): 
Affected: LT, GT, EQ, SO 
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stdu Not Implemented in MPC601 stdu 
Store Double Word with Update Integer Unit 


stdu rS,ds(rA) 


ee ee ae ee eee Ee 
5 6 


0 10 11 15 16 29 30 31 


EArA+EXTS(dsll0b00) 
(MEM(EA, 8))<—rS 
rACEA 


EA is the sum (rA)+(dsllOb00). Register rS is stored into the double word in storage 
addressed by EA. 


EA is placed into rA. 
If rA=0, the instruction form is invalid. 


This instruction is defined only for 64-bit implementations. Using it on a 32-bit 
implementation will cause the system illegal instruction error handler to be invoked. 


Other registers altered: 


None 

stdux Not Implemented in MPC601 stdux 
Store Double Word with Update Indexed Integer Unit 
stdux rs,rA,rB 


Reserved 





EA€rA+rB 
MEMC(EA, 8)<-rS 
rA¢-EA 


EA is the sum (rA)+(rB). Register rS is stored into the double word in storage addressed 
by EA. 


EA is placed into rA. 


If rA=0, the instruction form is invalid. 
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This instruction is defined only for 64-bit implementations. Using it on a 32-bit 
implementation will cause the system illegal instruction error handler to be invoked. 


Other registers altered: 


None 

stdx Not Implemented in MPC601 stdx 
Store Double Word Indexed Integer Unit 
stdx rs,rA,rB 


Reserved 





0 5 6 10 11 15 16 20 21 30 31 


if rA=0 then b¢-0 

else be-rA 

EA@b+rB 

(MEM(EA, 8))<-rS 
EA is the sum (rAl0)+(rB). Register rS is stored into the double word in storage addressed 
by EA. 


This instruction is defined only for 64-bit implementations. Using it on a 32-bit 
implementation will cause the system illegal instruction error handler to be invoked. 


Other registers altered: 


None 

stfiwx Not Implemented in MPC601 stfiwx 
Store Floating-Point as Integer Word Floating-Point Unit 
stfiwx frS,rA,rB 


Reserved 





0 5 6 10 11 15 16 20 21 30 31 
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if rA= 0 then bé—0 

else berA 
EA@b+rB 

MEMC(EA, 4)<—frS[32-63] 


EA is the sum (rAIQ)+(rB). 


The contents for the low-order 32 bits of register frS are stored, without conversion, into 
the word in storage addressed by EA. 


Other registers altered: 


None 

td Not Implemented in MPC601 td 
Trap Double Word Integer Unit 
td TO,rA,rB 


Reserved 





acrA 

be-rB 

if (a < b) & TO[O] then TRAP 
if (a> b) & TO[1] then TRAP 
if (a =b) & TO[2] then TRAP 
if (a u< b) & TO[3] then TRAP 
if (a u> b) & TO[4] then TRAP 


The contents of rA is compared with the contents of rB. If any bit in the TO field is set to 
1 and its corresponding condition.is met by the result of the comparison, then the system 
trap handler is invoked. 


This instruction is defined only for 64-bit implementations. Using it on a 32-bit 
implementation will cause the system illegal instruction error handler to be invoked. 


Other registers altered: 


None 
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tdi Not Implemented in MPC601 tdi 
Trap Double Word Immediate Integer Unit 


tdi TO,rA,SIMM 


6 


0 5 10 11 15 16 31 


acrA 

if (a < EXTS(SIMM)) & TO[0] then TRAP 

if (a> EXTS(SIMM)) & TO[1] then TRAP 

if (a= EXTS(SIMM)) & TO[2] then TRAP 

if (a u< EXTS(SIMM)) & TO[3] then TRAP 

if (a u> EXTS(SIMM)) & TO[4] then TRAP 
The contents of rA are compared with the sign-extended SIMM field. If any bit in the TO 
field is set to 1 and its corresponding condition is met by the result of the comparison, then 
the system trap handler is invoked. 


This instruction is defined only for 64-bit implementations. Using it on a 32-bit 
implementation will cause the system illegal instruction error handler to be invoked. 


Other registers altered: 


None 
tlbia Not Implemented in MPC601 tlbia 
Translation Lookaside Buffer Invalidate All Integer Unit 


Reserved 





0 5 6 10 11 15 16 20 21 30 31 


All TLB entries <— invalid 


This PowerPC instruction is not implemented by the MPC6()1. Execution of this instruction 
will invoke the illegal instruction handler. A description of the operation of this instruction 
is provided for emulation purposes. 


The entire TLB is invalidated (i.e., all entries are removed). 
The TLB is invalidated regardless of the settings of MSR[IT] and MSR[DT]. 


This is a supervisor-level instruction. 
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This instruction is optional in PowerPC architecture. 


Other registers altered: 


None 


It is not necessary that the ASR point to a valid segment table when issuing tlbia. 





tlbiex Not Implemented in MPC601 tibiex 
Translation Lookaside Buffer Invalidate Entry by Index Integer Unit 
tibiex rB 
Reserved 
0 5 6 10 11 15 16 20 21 30 31 
Ne€(rB) 
TLB entry N€-invalid 


This PowerPC instruction is not implemented by the MPC601. Execution of this instruction 


will invoke the illegal instruction handler. A description of the operation of this instruction 


is provided for emulation purposes. 


Let N be the contents of rB. The Nth TLB entry is made invalid (i.e., removed from the 
TLB). The TLB entry is invalidated regardless of the settings of MSR[IT] and MSR[DT]. 
If the Nth SLB does not exist, the results are implementation defined. 


This instruction is supervisor-level. 
This instruction is optional in PowerPC architecture. 


Other registers altered: 


None 


How software knows which TLB entry number is associated with which page table entry, 
or even how many TLB entries there are, is not specified in the architecture. This may differ 
among PowerPC processors. 


It is not necessary that the ASR point to a valid segment table when issuing tlbiex. 
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tlbSsync _Not Implemented in MPC601 tlbsync 
TLB Synchronize 


Reserved 





0 5 6 10 11 15 16 20 21 30 31 


The tlbsync instruction waits until all previous tlbie, tIbiex, and tlbia instructions executed 
by the processor executing this instruction have been received and completed by all other 
processors. 


This instruction is supervisor-level. 


This instruction is optional in PowerPC architecture, but it must be implemented if any of 
the following are true: 


¢ ATLB invalidation instruction that broadcasts is implemented. 
¢ The eciwx or ecowx instructions are implemented. 
Other registers altered: 


None 
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Appendix D 
Classes of Instructions 


This appendix describes how the classes of PowerPC instructions are defined.The three 
classifications are as follows: 


¢ Defined 
° Illegal 
e Reserved 


Note that while the definitions of these terms are consistent among the PowerPC 
processors, the assignment of these classifications is not. For example, an instruction that 
is specific to 64-bit implementations is considered defined for 64-bit implementations, but 
illegal for 32-bit implementations such as the MPC601. 


D.1 Classes of Instructions 


The MPC601 is a 32-bit implementation of the PowerPC architecture with differences and 
redefinitions noted throughout this document . Differences stem largely from the different 
address bus sizes and compliance with POWER architecture. 


All MPC6()1 instructions belong to one of the following three classes: 
¢ Defined 
° Illegal 
¢ Reserved 
The class is determined by examining the opcode and the extended opcode, if any. If the 


opcode, or combination of opcode and extended opcode, is not that of a defined instruction 
nor of a reserved instruction, the instruction is illegal. 


In future versions of the PowerPC architecture, instructions that are now illegal may 
become defined (by being added to the architecture) or reserved (by being assigned to one 
of the special purposes). Likewise, reserved instructions may become defined. 


D.1.1 Defined Instruction Class 


Defined instructions are guaranteed to be supported in all PowerPC implementations, 
except as stated in the instruction descriptions in Chapter 10, “Instruction Set.” The 
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MPC601 provides hardware support for most of the instructions defined for 32-bit 
implementations; it does not provide direct hardware support for the instructions listed in 
Appendix C, “PowerPC Instructions Not Implemented in MPC601.” 


The MPC601 invokes the system illegal instruction error handler (part of the program 
exception) when the unimplemented PowerPC instructions are encountered so they may be 
emulated in software, as required. 


A defined instruction can have invalid forms, as described in Section D.1.1.1, “Invalid 
Instruction Forms.” 


D.1.1.1 Invalid Instruction Forms 

An instruction form is invalid if one or more operands, excluding opcodes, are coded 
incorrectly. Attempting to execute an invalid form of an instruction either invokes the 
system illegal instruction error handler (a program exception) or yields undefined results. 
See Chapter 10, “Instruction Set,” for individual instruction descriptions. 


Invalid forms result when a bit or operand is coded incorrectly, for example, or when a 
reserved bit is shown as “(Q” but is coded as a “1”. The following instructions have invalid 
forms identified in their individual instruction descriptions: 


¢ Branch conditional instructions 

¢ Load/store with update instructions 

¢ Load multiple instructions 

¢ Load string instructions 

¢ Move to/from special purpose register (mtspr, mfspr) 
* Load/store floating-point with update instructions 


In some cases, an invalid form of a PowerPC instruction is not an invalid form for the 
corresponding POWER instruction. As a result, to maintain compatibility with POWER 
applications, the MPC601 often handles PowerPC invalid forms as described in the 
POWER architecture. In other cases, the MPC601 handles the invalid form in the manner 
that is most convenient for that particular case. Each of the PowerPC invalid forms are 
addressed in this document, and a description of how MPC6()1 handles each case is 
provided. 


D.1.2 Illegal Instruction Class 
Illegal instructions can be grouped into the following categories: 


¢ Instructions that are not implemented in the PowerPC architecture. These opcodes 
are available for future extensions of the PowerPC architecture; that is, future 
versions of the PowerPC architecture may define any of these instructions to 
perform new functions. The following primary opcodes are illegal: 


1, 4, 5, 6, 56, 57, 60, 61 
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¢ Instructions that are implemented in the PowerPC architecture but are not 
implemented in a specific PowerPC implementation (for example, instructions that 
can be executed on 64-bit PowerPC processors are considered illegal for 32-bit 
processors. 


The following opcodes are defined for 64-bit implementations only and are illegal 
on the MPC6()1: 


2, 30, 58, 62 


*The following primary opcodes have unused extended opcodes. Their unused 
extended opcodes can be determined from information in Section A.2, “PowerPC 
Instruction List Sorted by Opcode,” and Section D.1.3, “Reserved Instructions.” 
Notice that extended opcodes for instructions that are defined only for 64-bit 
implementations are illegal in 32-bit implementations. All unused extended 
opcodes are illegal. 


19, 31, 59, 63 (opcodes 30 and 62 are illegal for all 32-bit implementations, but as 
64-bit opcodes have some unused extended opcodes). 


An attempt to execute an illegal instruction invokes the illegal instruction error handler (a 
program exception) but has no other effect. See Section 5.4.7, “Program Exception 
(x’00700’),” for additional information about illegal and invalid instruction exceptions. 


Note that an instruction consisting entirely of binary zeros is guaranteed to be an illegal 
instruction. This increases the probability that an attempt to execute data or uninitialized 
memory invokes the system illegal instruction error handler (a program exception). Note 
that if only the primary opcode consists of all zeros, the instruction is considered a reserved 
instruction, as described in Section D.1.3, “Reserved Instructions.” 


With the exception of the instruction consisiting entirely of binary zeros, the illegal 
instructions are available for further additions to the PowerPC architecture. 


D.1.3 Reserved Instructions 


Reserved instructions are allocated to specific purposes outside the scope of the PowerPC 
architecture. An attempt to execute a reserved instruction either causes a program exception 
or yields undefined results. 


An attempt to execute a reserved instruction invokes the illegal instruction error handler (a 
program exception); however, the MPC60lexecutes many POWER architecture 
instructions that otherwise are not part of the PowerPC architecture. See Section 5.4.7, 
“Program Exception (x’00700’),” for additional information about illegal and invalid 
instruction exceptions. 


The instructions in this class are allocated to specific purposes that are outside the scope of 
the PowerPC user instruction set architecture, PowerPC virtual environment architecture, 
and PowerPC operating environment architecture. 
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The following types of instructions are included in this class: 


1. 


D-4 


Instructions for the POWER architecture that have not been included in the 
PowerPC user instruction set architecture 


Implementation-specific instructions used to conform to the PowerPC 

architecture specifications 

The instruction with primary opcode 0), when the instruction does not consist 
entirely of binary zeros 

Any other implementation-specific instructions that are not defined in the PowerPC 
user instruction set architecture, PowerPC virtual environment architecture, or the 
PowerPC operating environment architecture 
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Appendix E 
Multiple-Precision Shifts 


This appendix gives examples of how multiple precision shifts can be programmed. A 
multiple-precision shift is initially defined to be a shift of an n-word quantity, where n>1. 
The quantity to be shifted is contained in ” registers. The shift amount is specified either by 
an immediate value in the instruction or by bits 27-31 of a register. 


The examples shown below distinguish between the cases n = 2 and n> 2. If n =2, the shift 
amount may be in the range 0-63, which are the maximum ranges supported by the shift 
instructions used. However if n > 2, the shift amount must be in the range 0-31, for the 
examples to yield the desired result. The specific instance shown for n > 2 is n = 3: 
extending those instruction sequences to larger n is straightforward, as is reducing them to 
the case n = 2 when the more stringent restriction on shift amount is met. For shifts with 
immediate shift amounts only the case n = 3 is shown, because the more stringent 
restriction on shift amount is always met. 


In the examples it is assumed that GPRs 2 and 3 (and 4) contain the quantity to be shifted, 
and that the result is to be placed into the same registers. In all cases, for both input and 
result, the lowest-numbered register contains the highest-order part of the data and highest- 
numbered register contains the lowest-order part. For non-immediate shifts, the shift 
amount is assumed to be in bits 27-31 (32-bit mode) of GPR6. For immediate shifts, the 
shift amount is assumed to be greater than (. GPRs 0-31 are used as scratch registers. For 
n > 2, the number of instructions required is 2N-1 (immediate shifts) or 3N-1 (non- 
immediate shifts). 


E.1 Multiple-Precision Shift Examples 


The examples shown here are for 32-bit mode, but they work both in 32-bit mode of a 64- 
bit implementation and in a 32-bit implementation. They perform the shift in units of 
words. If the ability to run in 32-bit implementations is not required, in a 64-bit 
implementation better performance can be obtained in 32-bit mode than that of the 
examples shown above, by using all 64 bits of GPRs 2 and 3 (and 4) to contain the quantity 
to be shifted, and placing the result into all 64 bits of the same registers. 


Let n be the number of words to be shifted. 
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Shift Left Immediate, n = 3 (Shift Amount < 32) 
rlwinm r2,r2,SH,0,31-SH 
rlwimi r2,r3,SH,32-SH,31 
rlwinm r3,r3,SH,0,31-SH 
rlwimi r3,r4,SH,32-SH,31 
rlwinm r4,r4,SH,0,31-SH 


Shift Left, n = 2 (Shift Amount < 64) 
subfic r31,r6,32 


slw r2,r2,r6 
srw r0,r3,r31 
or r2,r2,r0 


addic r31,r6,r6 
slw r0,r3,r31 
or r2,r2,xr0 
slw r3,r3,r6 


Shift Left, n = 3 (Shift Amount < 32) 
subfic r31,r6,32 


slw r2,r2,r6 
srw r0,r3,r31 
or r2,r2,r0 
slw r3,r3,6 
srw r0,r4,r31 
or r3,r3,r0 
slw r4,r4,r6 


Shift Right Immediate, n = 3 (Shift Amount < 32) 
rlwinm r4,r4,32-SH,SH,31 

rlwimi r4,r3,32-SH,0,SH-1 

rlwinm r3,r3,32-SH,SH,31 

rlwimi r3,r2,32-SH,0,SH-1 

rlwinm r2,r2,32-SH,SH,31 


Shift Right, n = 2 (Shift Amount < 64) 
subfic r31,r6,32 


srw r3,r3,r6 
slw r0,r2,r31 
or r3,r3,r0 
addic r31,r6,-32 
srw r0,r2,r31 
or r3,xr3,r0 
srw r2,r2,r6 


Shift Right, n = 3 (Shift Amount < 32) 
subfic r31,r6,32 


srw r4,r4,r6 
slw r0,r2,r31 
or r4,r4,r0 
srw r31,r3,r6 
slw r0,r2,r31 
or r3,r3,r0 
srw r2,xr2,7r6 
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Shift Right Algebraic Immediate, n = 3 (Shift Amount < 32) 


rlwinm r4,r4,32-SH,SH,31 

rlwimi r4,r3,32-SH,0,SH-1 
rlwinm r3,r3,32-SH,SH, 31 

rlwimi r3,r2,32-SH,0,SH-1 
srawi r2,r2,SH 


Shift Right Algebraic, n = 2 (Shift Amount < 64) 
subfic r31,r6,32 


srw r3,r3,r6 
slw r0,r2,r31 
or r3,7r3,r0 


addic. r31,r6,-32 
sraw r0,r2,r31 
ble $+8 

ori r3,r0,0 
sraw r2,r2,r6 


Shift Right Algebraic, n = 3 (Shift Amount < 32) 
subfic r31,r6,32 


srw r4,r4,r6 
slw r0,r3,r31 
or r4,r4,r0 
srw r3,r3,r6 
slw r0,r2,r31 
or r3,r3,r0 


sraw r2,r2,r6 
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Appendix F 
Floating-Point Models 


This appendix gives examples of how the floating-point conversion instructions can be 
used to perform various conversions. 


F.1 Conversion from Floating-Point Number to 
Signed Fixed-Point Integer Word 


The full convert to signed fixed-point integer word function can be implemented with the 
sequence shown below, assuming that the floating-point value to be converted is in FPR1, 
the result is returned in GPR3, and a double word at displacement "disp" from the address 
in GPR1 can be used as scratch space. 


fctiw[(z]f£2,f1 #convert to £x int 
stfd £2,disp(r1l) #store float 
lwz r3,disp+4(r1) #load word and zero 


F.2 Conversion from Floating-Point Number to 
Unsigned Fixed-Point Integer Word 


The full convert to unsigned fixed-point integer word function can be implemented with the 
sequence shown below, assuming that the floating-point value to be converted is in FPR1, 
the value 0 is in FPRO, the value 2°? is in FPR3, the value x’0000 0000 7FFF FFFF’ is in 
FPR4, the value 231 is in FPRS and GPRS, the result is returned in GPR3, and a double 
word at displacement "disp" from the address in GPR1 can be used as scratch space. 


fmr £2,f0 fuse 0 if < 0 

fempu cr2,f1,f0 

bl ecr2,store 

fmr £2,f£4 #use max if > max 
fempu cr2,f1,f3 

bgt cr2,store 

fsub £2,£1,£5 #subtract 2**31 
fempu cr2,f1,f£5 fuse Giff if 2 2**31 


bnl cr2,$+8 
fmr £2,f£1 


fctiw[z]f£2,f£2 #convert to £x int store- 
stfd £2,disp(r1l) #store float 

lwz r3,disp+4(r1) #load word 

bl cr2,$+8 #add 2**31 if input 

add r3i,03;-%5 #fwas 2 2**31 
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F.3 Floating-Point Models 


This section describes models for floating-point instructions. 


F.3.1 Floating-Point Round to Single-Precision Model 


The following algorithm describes the operation of the Floating-Point Round to Single- 
Precision (frsp) instruction. 


If FRB[1-11]<897 and FRB[1-63]>0 then 
Do 
If FPSCR[UE]=0 then goto Disabled Exponent Underflow 
If FPSCR[UE]=1 then goto Enabled Exponent Underflow 
End 


If FRB[(1-11]>1150 and FRB[1-11)]<2047 then 
Do 
If FPSCR[OE]=0 then goto Disabled Exponent Overflow 
If FPSCR[OE]=1 then goto Enabled Exponent Overflow 
End . 


If FRB[1l-11]>896 and FRB[1-11]<1151 then goto Normal Operand 
If FRB[1-63]=0 then goto Zero Operand 


If FRB[1-11]=2047 then 
Do 
If FRB[(12-63]=0 then goto Infinity Operand 
If FRB{12]=1 then goto QNaN Operand 
If FRB[(12]=0 and FRB[13-63]J>0 then goto SNaN Operand 
End 


Disabled Exponent Underflow: 


sign < FRBO 
If FRB[1-11]=0 then 


Do 
exp ¢< -1022 
frac ¢ b'0' [| FRB[12-63] 
End 
If FRB[1l-11]>0 then 
Do 
exp ¢ FRB[1-11] - 1023 
frac ¢— b’l1’ || FRB[12-63] 
End 
Denormalize operand: 
Gli R{II| X¥ © b’000' 


Do while exp<-126 
exp ¢< exp + 1 


frac || Gil RI] X —& b'O' |{ frac [| G I{ (R [ X) 
End 
FPSCR[UX] < frac{24-52] || GJ] R || X>0 
If frac{24-52] || G || R [|] X>0 then FPSCR[XX] <1 


Round single(sign,exp, frac,G,R,X) 
If frac=0 then 


Do 
FRTOO © sign 
FRTO[1-63] < 0 
If sign=0 then FPSCR{FPRF] < "+zero" 
If sign=1 then FPSCR[FPRF] < "-zero" 





F-2 PowerPC 601 RISC Microprocessor User's Manual MOTOROLA 


End 
If frac>0O then 


Do 
If frac[0)=1 then 
Do 
If sign=0 then FPSCR[FPRF] <« "“+normal number" 
If sign=1 then FPSCR[FPRF] «< "-normal number" 
End 
If frac[{[0j]=0 then 
Do 
If sign=0 then FPSCR[FPRF] « "+denormalized number" 
If sign=1 then FPSCR[FPRF] < "-denormalized number" 
End 
Normalize operand- 
Do while frac[0]=0 
exp < exp-l 
frac || Gl] R © frac[{1-52] || Gdl R |l{ b’0’ 


End 
FRT[0] € sign 
FRT(1-11] < exp + 1023 
FRT(12-63] €— frac[1-23] || b’0 0000 0000 0000 0000 0000 0000 0000’ 
End 
Done 


Enabled Exponent Underflow 


FPSCR[UX] < 1 
sign « FRB[0] 
If FRB[1-11]=0 then 


Do 
exp <¢ -1022 
frac <— b'0O' |! FRB[12-63] 
End 
If FRB({1-11]>0 then 
Do 
exp ¢ FRB[1-11] - 1023 
frac € b’l’ || FRB[12-63] 
End 


Normalize operand- 
Do while frac{0]=0 
exp < exp - 1 
frac ¢ frac[1-52] || b’0' 
End 
If frac({24-52]>0 then FPSCR[XX] € 1 
Round single(sign,exp,frac,0,0,0) 
exp < exp + 192 
FRT[(O] © sign 
FRT(1-11]) < exp + 1023 
FRT [12-63] < fracl-23 || b‘'0 0000 0000 0000 0000 0000 0000 0000’ 
If sign=0 then FPSCR[FPRF] < "+4+normal number" 
If sign=1 then FPSCR([FPRF] «< "“-normal number" 
Done 


Disabled Exponent Overflow 


inc <0 
FPSCR[OX] ¢< 1 
FPSCR[XX] < 1 
If FPSCR[RN]= b’00' then/* Round to Nearest */ 
Do 
inc <0 
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If FRB[0]=0 then FRT ¢< x’7FFO 0000 0000 0000 
If FRB[0]=1 then FRT <¢<x'FFFO 0000 0000 0000’ 
If FRB[0]=0 then FPSCR[FPRF] €< "+infinity" 
If FRB[O}J=1 then FPSCR[FPRF] < "-infinity" 

End 

If FPSCR[RN]=b‘01’ then/* Round Truncate */ 

Do 

If (b‘'0’ || FRB[1-63]) < x’047EF "FFF E000 0000’ then inc <0 


If FRB[0]=0 then FRT ¢x’47EF FFFF E000 0000’ 
If FRB[O]=1 then FRT €x’'C7EF FFFF E000 0000’ 
If FRB[0]=0 then FPSCR[FPRF] €< "+normal number" 
If FRB[O)]=1 then FPSCR[FPRF] << "-normal number" 

End 

If FPSCR[RN]= b’10' then /* Round to +Infinity */ 

Do 
If FRB[0]=0 then inc €¢€ 0 
If (FRB(O]=1 & (FRB > x'C7EF FFFF E000 0000’ then inc € 1) 


If FRB[0]=0 then FRT € x’7FFO 0000 0000 0000’ 
If FRB[O]=1 then FRT ¢ x’‘C7EF FFFF E000 0000’ 
If FRB[0]=0 then FPSCR[FPRF] < "+infinity" 
If FRB[O]=1 then FPSCR[FPRF] <— "-normal number" 
End 
If FPSCR[RN]=b’11' then/* Round to -Infinity */ 
Do 
(If FRB[0]=0 & FRB < x'47EF FFFF E000 0000’) then inc ¢ 1 
If FRB{O0J= 1 then inc ¢€ 1 
If FRB[0]=0 then FRT ¢< x‘47EF FFFF E000 0000’ 
If FRB[0O)=1 then FRT ¢ x‘'FFFO 0000 0000 0000’ 
If FRB[0]=0 then FPSCR[FPRF] < "+4+normal number" 
If FRB[OJ=1 then FPSCR[FPRF] < "“-infinity" 
End 
FPSCR[FR] < inc 
FPSCR[FI] ¢< 1 
Done 


Enabled Exponent Overflow 


sign ¢ FRB[0] 
exp ¢< FRB[1-11] - 1023 
frac ¢ b’l’ {| [12-63] 
If frac{[24-52]>0 then FPSCR[XX] < l1 
Round single(sign,exp,frac,0,0,0) 
Enabled Overflow 
FPSCR[OX] €< 1 
exp < exp - 192 
FRT(O] €< sign 
FRT[1-11] < exp + 1023 
FRT [12-63] «< frac[{1-23] |] b’0 0000 0000 0000 0000 0000 0000 0000’ 
If sign=0 then FPSCR[FPRF] < "4+normal number" 
If sign=l then FPSCR[FPRF] < "“-normal number" 


Done 

Zero Operand 

FRT < FRB 

If FRB[0]=0 then FPSCR[FPRF] « "+zero" 
Tf°FRB[(O]=1 then FPSCR[FPRF] < "-zero" 


FPSCR[FR FI] € b’‘00’ 
Done 
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Infinity Operand 


FRT <- FRB 

If FRB[{0]=0 then FPSCR[FPRF] ¢ "+infinity" 

If FRB[O]=1 then FPSCR[FPRF] < "-infinity" Done 

QNaN Operand- 

FRT < FRB[0-34) |! b’O 0000 0000 0000 0000 0000 0000 0000’ 
FPSCR[FPRF] ¢< “QNaN" 

FPSCR[FR FI] < b’00' 


Done 
QNaN Operand 
FRT < FRB[0-34] || b‘0 0000 0000 0000 0000 0000 0000 0000' 


FPSCR[FPRF] <— "QNaN" 
FPSCR[FR FI] € b’‘00’ 
Done 


SNaN QOperand 


FPSCR[VXSNAN] € 1 
If FPSCR[VE]=0 then 
Do 
FRT(0-11]) « FRB[(O0-11] 
FRT([12] ¢ 1 
FRT(13-63] © FRB[13-34] || b’0 0000 0000 0000 0000 0000 0000 
0000’ 
FPSCR[FPRF] < "QNaN" 
End 
FPSCR[FR FI] < b’00' 
DDone 


Normal Operand 


sign ¢< FRB[0] 

exp < FRB[1-11] - 1023 

frac ¢ b’1’ {| FRB[12-63] 

If frac{24-52]>0 then FPSCR[XX] < Il 

Round single(sign,exp,frac,0,0,0) 

If exp>+127 and FPSCR[OE]=0 then go to Disabled Exponent Overflow 
If exp>+127 and FPSCR[OE]=1 then go to Enabled Overflow 

FRT(0] < sign 

FRT[1-11] <— exp + 1023 

FRT [12-63] < frac[{1-23] || b’0 0000 0000 0000 0000 0000 0000 0000" 
If sign=0 then FPSCR[FPRF] < "+normal number" 

If sign=1 then FPSCR[FPRF] < "-normal number" 

Done 


Round Single (sign,exp,frac,G,R,X) 


inc € 0 
lsb ¢ frac[23]} 
gbit ¢« frac[24] 
rbit <frac[25] 
xbit ¢€ (frac[26-52]|/G||RI|X)#0 
If FPSCR[RN]=b’‘00' then 

Do 

If sign || lsb || gbit rbit || xbit 


|| b/ulluu’ then inc € 1 
If sign || lsb || gbit || rbit || xbit 
| | 


b/u0llu’ then inc €<1 


If sign || lsb || gbit rbhit || xbit = b’u0lul’ then inc <1 
End 
If FPSCR[RN]= b’10’ then 
Do 
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If sign || lsb |] gbit |{ rbit [| xbit = b'0uluu’ then inc ¢ 1 
If sign || lsb || gbit ||] rbit || xbit = b’Ouulu’ then inc ¢ 1 
If sign || lsb 1/1! gbit || rbit || xbit = b’0Ouuul’ then inc € 1 
End 
If FPSCR[RN]= b’11’ then 
Do 


b’luluu’ then inc € 1 
b‘luulu’ then inc € 1 
b’luuul’ then inc € 1 


If sign || lsb |] gbit || rbit || xbit 
If sign || lsb || gbit ||] rbit || xbit 
If sign |1 lsb || gbit || rbit || xbit 
End 
frac({[0-23] «€ frac[{0-23] + inc 
If carry_out=1 then 


Do 
frac([0-23] ¢ b’l’ [| frac[0-22] 
exp < exp + 1 

End 


FPSCR[FR] ¢€ inc 
FPSCR(FI]) ¢ gbit | rbit | xbit 
Return 


F.3.2 Floating-Point Convert to Integer Model 


The following algorithm describes the operation of the floating-point convert to integer 
instructions. In this example, u represents an undefined hexadecimal digit.. 


If Floating Convert to Integer Word 
Then Do 
Then round_mode ¢ FPSCR[RN] 
tgt_precision ¢ "32-bit integer" 
End 
If Floating Convert to Integer Word with round toward Zero 
Then Do 
round_mode ¢ b’01’ 
tgt_precision ¢ "32-bit integer" 
End 
If Floating Convert to Integer Doubleword 
Then Do 
round_mode < FPSCR[RN] 
tgt_precision «< "64-bit integer" 
End 
If Floating Convert to Integer Doubleword with round toward Zero 
Then Do 
round_mode ¢< b’0Ol1’ 
tgt_precision ¢ "64-bit integer" 
End 
If FRB[1-11]=2047 and FRB[12-63]=0 then goto Infinity Operand 
If FRB(1-11]=2047 and FRB12=0 then goto SNaN Operand 
If FRB[1~-11]=2047 and FRB12=1 then goto QNaN Operand 
If FRB[1-11]>1086 then goto Large Operand 


sign « FRBO 

If FRB[{1-11]>0 then exp ¢ FRB[{1-11] - 1023 /* exp - bias */ 

If FRB[1-11]=0 then exp ¢ -1022 

If FRB[1-11]>0 then frac[0-64]¢b’01’|]|/FRB[(12-63]||b’00000000000’' 
/*normal* / 

If FRB[1-11]=0 then frac[0-64]<b’00’||/FRB[12-63]|]b’00000000000' 
/*denormal*/ 
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gbit || rbit || xbit ¢ b’000’ 
Do i=1,63-exp 
frac([0-64] || gbit || rbit || xbit ¢ b'0' [| frac({0-64] || gbit || 
(rbit|xbit) 
End 


If gbit | rbit | xbit then FPSCR[XX] < 1 


Round Integer (frac,gbit,rbit,xbit,round_mode) 
In this example, u represents an undefined hexadecimal digit. Comparisons ignore the u 
bits. 


If sign=l1 then frac[0-64] «€ -frac[0-64] + 1 


If tgt_precision="32-bit integer" and frac[0-64]>4+2(31)-1 
then goto Large Operand 
If tgt_precision="64-bit integer" and frac[0-64]>+2(63)-1 
then goto Large Operand 
If tgt_precision="32-bit integer" and frac [0-64]<-2 (31) then goto Large 
Operand 
Lf tgt_precision="64-bit integer" and frac[0-64]<-2(63) then goto Large 
Operand 
If tgt_precision="32-bit integer" 
then FRT © x’xuuuuuuu’ || frac[{33-64] 
If tgt_precision="64-bit integer" then FRT ¢€ frac[{1-64] 
FPSCR[FPRF] < undefined 
Done 


Round Integer(frac,gbit,rbit,xbit,round_mode) 
In this example, u represents an undetined hexadecimal digit. Comparisons ignore the u 
bits. 


inc € 0 
If round_mode= b‘00’ then 
Do 
If sign || frac[{64] || gbit || rbit || xbit = b’ullux’ then ince 1 
If sign || frac({64] || gbit || rbit || xbit = b’u0llx’ then inc <1 
If sign || frac64 || gbit ||] rbit || xbit = b’u0lul’ then inc ¢ 1 
End 
If round_mode= b‘10’ then 
Do 
If sign || frac64 || gbit || rbit || xbit = b’Oulux’ then inc ¢ 1 
If sign || frac64 || gbit || rbit || xbit = b’Ouulx’ then inc ¢ 1 
If sign || frac6é4 || gbit || rbit || xbit = b’Ouuul’ then inc ¢ 1 
End 
If round_mode= b‘11’ then 
Do 
If sign || frac64 || gbit || rbit || xbit = b’lulux’ then inc ¢€ 1 
If sign || frac64 || gbit || rbit || xbit = b’luulx’ then inc ¢ 1 


If sign || frac64 || gbit || rbit || xbit = b’luuul’ then inc € 1 
End 
frac[0-64] < frac[0-64] + inc 
FPSCR[FR] €¢ inc 
FPSCR[FI] € gbit | rbit | xbit 
Return 
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Infinity Operand 


FPSCR[FR FI VXCVI] < b’001' 
If FPSCR[VE]=0 then Do 
If tgt_precision="32-bit integer" then 
Do 
If sign=0 then FRT © x’uuuu uuut. 7FFF FFFF’ 
If sign=1 then FRT © x’uuuu uuuu 8000 0000’ 
End 
Else 
Do 
If sign=0 then FRT € X'7FFF FFFF FFFF FFFF’ 
If sign=l then FRT ¢€ x’8000 0000 0000 0000’ 
End 
FPSCR[FPRF] < undefined 
End 
Done 


SNaN Qperand 


FPSCR[FR FI VXCVI VXSNAN] <€ b’‘0011' 
If FPSCR[VE]=0 then 
Do 
If tgt_precision="32-bit integer" 
then FRT < x’uuuu uuuu 8000 0000’ 
If tgt_precision="64-bit integer" 
then FRT ¢< x’8000 0000 0000 0000’ 
FPSCR[FPRF] ¢< undefined 
End 
Done 


QNaN Operand 


FPSCR[FR FI VXCVI] €< b’001’ 
If FPSCR[VE]=0 then 
Do 
If tgt_precision="32-bit integer" then FRT < x’‘uuuu uuuu 8000 


0000' 
If tgt_precision="64-bit integer" then FRT<x’ 8000 0000 0000 0000' 
FPSCR[FPRF] < undefined 
End 
Done 
Large Operand 


FPSCR[FR FI VXCVI] < b’001' 
If FPSCR[VE]=0 then Do 
If tgt_precision="32-bit integer" then 
Do 
Tf sign=0 then FRT © x’uuuu uuuu 7FFF FFFF’ 
If sign=1 then FRT € x’uuuu uuuu 8000 0000’ 
End 
Else 
Do 
If sign=0 then FRT ¢ xX‘'7FFF FFFF FFFF FFFF’ 
If sign=1 then FRT < x’8000 0000 0000 0000’ 
End 
FPSCR[FPRF] < undefined 
End 
Done 
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F.4 Floating-Point Convert from Integer Model 


The following algorithm describes the operation of the floating-point convert from integer 
instructions. 


sign © FRB[0] 
exp < 63 
frac ¢< FRB 


If frac=0 then go to Zero Operand 
If sign=1 then frac ¢ —frac +1 


Do until frac[0O]=1 
frac < frac[1-63] ||] b’0' 
exp ¢< exp - l 

End 


Round Float (sign,exp,frac,FPSCR[RN]) 


If sign=1 then FPSCR[FPRF] ¢< “-normal number" 
If sign=0 then FPSCR[FPRF] ¢< "+normal number" 
FRT[0] € sign 

FRT[1-11] < exp + 1023 /* exp + bias */ 

FRT [12-63] €< frac[1-52] 

Done 


Zero Operand 


FPSCR[FR FI] € b’00’ 
FPSCR[FPRF] € "+zero" 

FRT < x’0000 0000 0000 0000' 
Done 


Round Float (sign,exp,frac,round_mode) 
In this example, the bits designated as u are ignored in comparisons. 


inc ¢€ 0 

lsb © frac[52] 

gbit ¢€ frac[53] 

rbhit « frac[54] 

xbit «© frac{55-63]>0 

If round_mode=b’00’ then 
Do 


If sign || lsb |! gbit || rbit || xbit = b’ulluu’ then inc ¢ 1 
If sign |! lsb || gbit || rbit ||] xbit = b’u0llu’ then inc ¢ 1 
If sign || Isb || gbit || rbit || xbit = b’u0lul’ then inc ¢ 1 
End 
If round_mode= b’10’ then 
Do 
If sign || lsb || gbit || rbit || xbit = b’Ouluu’ then inc ¢ 1 
If sign ||] lsb || gbit || rbit || xbit = b’Ouulu’ then inc ¢€ 1 
If sign || lsb || gbit ||! rbit || xbit = b’Quuul’ then inc € 1 
End 
If round_mode= b‘11‘' then 
Do 
If sign ||] lsb || gbit || rbit || xbit = b’luluu’ then inc € 1 
If sign |! lsb || gbit || rbit || xbit = b’luulu’ then inc € 1 
If sign || lsb || gbit || rbit || xbit = b’luuul’ then inc ¢ 1 
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End 
frac[0-52] €< frac[{0-52] + inc 
Tf carry_out=1 then exp ¢ exp + 1 
FPSCR[FR] < inc 
FPSCR[FI] < gbit | rbit | xbit 
If (gbit | rbit | xbit) then FPSCR[XX] < 1 


Return 





MOTOROLA 


F-10 PowerPC 601 RISC Microprocessor User's Manual 


Appendix G 
Synchronization Programming 
Examples 


The examples in this appendix show how synchronization instructions can be used to 
emulate various synchronization primitives and how to provide more complex forms of 
synchronization. 


For each of these examples, it is assumed that a similar sequence of instructions is used by 
all processes requiring synchronization of the accessed data. 


G.1 General Information 


The following points provide general information about the lwarx and stwex. instructions: 


In general, lwarx and stwex. instructions should be paired, with the same effective 
address used for both. The exception is an isolated stwex. instruction that is used to 
clear any existing reservation on the processor, for which there is no paired lwarx 
and for which any (scratch) effective address can be used. 


It is acceptable to execute an Iwarx instruction for which no stwex. instruction is 
executed. For example, such a dangling Iwarx instruction occurs if the value loaded 
in the Test and Set sequence shown Section G.2.5, "Test and Set,” is not zero. 


To increase the likelihood that forward progress is made, it is important that looping 
on Iwarx/stwex. pairs be minimized. For example, in the sequence shown above for 
Test and Set, this is achieved by testing the old value before attempting the store— 
were the order reversed, more stwex. instructions might be executed, and 

reservations might more often be lost between the lwary and the stwex. instructions. 


The manner in which Iwarx and stwex. are communicated to other processors and 
mechanisms, and between levels of the memory subsystem within a given processor 
is implementation-dependent. In some implementations performance may be 
improved by minimizing looping on a lwarx instruction that fails to return a desired 
value. For example, in the Test and Set example shown above, if the programmer 
wishes to stay in the loop until the word loaded is zero, he could change the bne S+ 
12 to bne loop. However, in some implementations better performance may be 
obtained by using an ordinary Load instruction to do the initial checking of the 
value, as follows: 
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loop: lwz rS,0(r3) #load the word 
cmpwi r5,0 #loop back if word 


bne loop #not equal to 0 

lwarx rS,0,r3#try again, reserving 
cmpwi 1r5,0 #(likely to succeed) 

bne loop #try to store nonzero 
stwex. r4,0,r3#loop if lost reservation 
bne loop 


¢ Ina multiprocessor, livelock is possible if a loop containing an lwarx/stwex. pair 
also contains an ordinary Store instruction for which any byte of the affected 
memory area is in the reservation granule of the reservation. For example, the first 
code sequence shown in Section D. 1.2, List Insertion, can cause livelock if two list 
elements have next element pointers in the same reservation granule. 


G.2 Synchronization Primitives 


The following examples show how the Iwarx and stwcex. instructions can be used to 
emulate various synchronization primitives. The sequences used to emulate the various 
primitives consist primarily of a loop using Iwarx and stwex.. Additional synchronization 
is unnecessary, because the stwex. will fail, clearing the EQ bit, if the word loaded by Iwarx 
has changed before the stwex. is executed. 


G.2.1 Fetch and No-Op 


The Fetch and No-Op primitive atomically loads the current value in a word in memory. In 
this example it is assumed that the address of the word to be loaded is in GPR3 and the data 
loaded are returned in GPR4. 


loop: lwarx r4,0,r3#load and reserve 
ctwex. rd ~_ #store old value if still reserved 
bne loop #loop if lost reservation 


Notes: 


1. Because stwex. is not necessarily performed with respect to all other mechanisms 
that access memory, an ordinary load instruction, or even a Load and Reserve 
instruction, on a different processor, may return a stale value. However, a 
subsequent Iwarx on the other processor followed by a successful stwex. on that 
processor is guaranteed to have returned the value stored by the first processor's 
stwex. (in the absence of other stores to the location). 


2. The storing done by the stwex. instruction in this example is redundant. 
G.2.2 Fetch and Store 
The Fetch and Store primitive atomically loads and replaces a word in memory. 


In this example it is assumed that the address of the word to be loaded and replaced is in 
GPR3, the new value is in GPR4, and the old value is returned in GPRS. 
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loop: Ilwarx r5,0,r3#load and reserve 
stwex. r4,0,r3#store new value if still reserved 
bne loop #loop if lost reservation 


G.2.3 Fetch and Add 
The Fetch and Add primitive atomically increments a word in memory. 


In this example it is assumed that the address of the word to be incremented is in GPR3, the 
increment is in GPR4, and the old value is returned in GPRS. 


loop: lwarx rS,0,r3 #load and reserve 
add ra,r4,rS #increment word 
stwex. ra,0,r3 #store new value if still reserved 
bne loop #loop if lost reservation 


G.2.4 Fetch and AND 


The Fetch and AND primitive atomically ANDs a value into a word in memory. 


In this example it is assumed that the address of the word to be ANDed is in GPR3, the 
value to AND into it is in GPR4, and the old value is returned in GPRS. 


loop: lwarx rS,0,r3 #load and reserve 
and ra,r4,rs #AND word 
stwex. ra,0,r3 #store new value if still reserved 
bne loop #loop if lost reservation 


Note: This sequence can be changed to perform another Boolean operation atomically on 
a word in memory, simply by changing the AND instruction to the desired Boolean 
instruction (OR, XOR, etc.). 


G.2.5 Test and Set 
The Test and Set primitive atomically loads a word from memory, ensures that the word in 
memory contains a non-zero value, and sets the EQ bit of CR Field () according to whether 
the value loaded is zero. 


In this example it is assumed that the address of the word to be tested is in GPR3, the new 
value (non-zero) is in GPR4, and the old value is returned in GPRS. 


loop: Ilwarx rS,3,r3 #load and reserve 
cmpwi rS, a #done if word 


bne $+12 #not equal to 0 
stwex. r4,0,r3#try to store nonzero 
bne loop #loop if lost reservation 
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Notes: 


1. Test and Set is shown primarily for pedagogical reasons. It is useful on machines 
that lack the better synchronization facilities provided by Iwarx and stwex.. Test and 
Set does not scale well. Using Test and Set before a critical section allows only one 
process to execute in the critical section at a time. Using Iwarx and stwex. to bracket 
the critical section allows many processes to execute in the critical section at once, 
but at most one will succeed in exiting from the section with its results stored. 


2. Depending on the application, if Test and Set fails (that is, clears the EQ bit of CR 
Field ()) it may be appropriate to re-execute the Test and Set. 


G.3 Compare and Swap 


The Compare and Swap primitive atomically compares a value in a register with a word in 
memory, if they are surely equal stores the value from a second register into the word in 
memory, if they may be unequal loads the word from memory into the first register, and sets 
the EQ bit of CR Field 0 to indicate the result of the comparison. 


In this example it is assumed that the address of the word to be tested is in GPR3, the 
comparand is in GPR4, the new value is in GPRS, and the old value is returned in GPR6. 


lwarx r6,0,r3#load and reserve 
cmpw r4,r6 #first 2 operands equal ? 


bne $4+8 #skip if not 
stwex. rS,0,r3#store new value if still reserved 
Notes: 


1. Compare and Swap is shown primarily for pedagogical reasons. It is useful on 
machines that lack the better synchronization facilities provided by lwarx and 
stwex.. A major weakness of typical Compare and Swap instructions is that they 
permit spurious success if the word being tested has changed and then changed back 
to its old value: the sequence shown above does not have this weakness. 


2. Depending on the application, if Compare and Swap fails (that is, clears the EQ bit 
of CRQ) it may be appropriate to recompute the value potentially to be stored and 
then re-execute the Compare and Swap. 


G.4 List Insertion 


The following example shows how the Iwarx and stwex. instructions can be used to 
implement simple LIFO (last-in-first-out) insertion into a singly-linked list. (Complicated 
list insertion, in which multiple values must be changed atomically, or in which the correct 
order of insertion depends on the contents of the elements, cannot be implemented in the 
manner shown below, and requires a more complicated strategy such as using locks.) 
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The next element pointer from the list element after which the new element is to be inserted, 
here called the parent element, is stored into the new element, so that the new element 
points to the next element in the list: this store is performed unconditionally. Then the 
address of the new element is conditionally stored into the parent element, thereby adding 
the new element to the list. 


In this example it is assumed that the address of the parent element is in GPR3, the address 
of the new element is in GPR4, and the next element pointer is at offset O from the start of 
the element. It is also assumed that the next element pointer of each list element is in a 
reservation granule separate from that of the next element pointer of all other list elements. 


loop: lwarx r2,0,r3#get next pointer 


stw r2,0(r4)#store in new element 

sync #let store settle (can omit if not MP) 
stwex. r4, a, r3#add new element to list 

bne loop #loop if stwcex. failed 


In the preceding example, if two list elements have next element pointers in the same 
reservation granule then, in a multiprocessor, livelock can occur. (Livelock is a state in 
which processors interact in a way such that no processor makes progress.) 


If it is not possible to allocate list elements such that each element's next element pointer is 
in a different reservation granule, then livelock can be avoided by using the following, more 
complicated, code sequence. 


lwz r2,0(r3)#get next pointer 
loopl: mr r5,r2 #keep a copy 

stw r2,0(r4)#store in new element 

sync #let store settle 


loop2: lwarxrZ,0,r3 #get it again 
cmpw r2,r5 #loop if changed (someone 
bne loopl #else progressed) 
stwex. r4,0,r3 #add new element to list 
bne loop2 #loop if failed 
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Glossary of Terms and Abbreviations 


The glossary contains an alphabetical list of terms, phrases, and abbreviations used in this 
book. Some of the terms and definitions included in the glossary are reprinted from /EEE 
Std 754-1985, IEEE Standard for Binary Floating-Point Arithmetic, copyright ©1985 by 
the Institute of Electrical and Electronics Engineers, Inc. with the permission of the IEEE. 


A Atomic. A bus access that attempts to be part of a read-write operation to the 
same address uninterrupted by any other access to that address (the 
term refers to the fact that the transactions are indivisible). The 
MPC601 initiates the read and write separately, but signals the 
memory system that it is attempting an atomic operation. If the 
operation fails, status is kept so that the MPC601 can try again. The 
MPC601 implements atomic accesses through the lwarx/stwex. 
instruction pair, which asserts the TTO signal. 


B Beat. A single state on the MPC601 interface that may extend across multiple 
bus cycles. An MPC601 transaction can be composed of multiple 
address or data beats. 


Biased Exponent. The sum of the exponent and a constant (bias) chosen to 
make the biased exponent's range non-negative. 


Big-Endian. A byte-ordering method in memory where the address n of a 
word corresponds to the most significant byte. In an addressed 
memory word, the bytes are ordered (left to right) 0, 1, 2, 3, with 0 
being the most significant byte. 


Boundedly Undefined. The results of attempting to execute a given 
instruction are said to be boundedly undefined if they could have 
been achieved by executing an arbitrary sequence of defined 
instructions, in valid form, starting in the state the machine was in 
before attempting to execute the given instruction. Boundedly 
undefined results for a given instruction may vary between 
implementations, and between execution attempts in the same 
implementation. 
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Branch Folding. A technique of removing the branch instruction from the 
instruction sequence. 


Burst. A multiple beat data transfer whose total size is typically equal to a 
cache block (in the MPC6()1: a 32-byte sector). 


Bus Clock. Clock that causes the bus state transitions 


Bus Master. The owner of the address or data bus; the device that initiates or 
requests the transaction. 


Cache. High-speed memory containing recently accessed data and/or 
instructions (subset of main memory). 


Cache Block. The cacheable unit for a PowerPC processor. The size of a 
cache block may vary among processors. For the MPC6()1, it is one 
sector (8 words). 


Cache Coherency. Caches are coherent if a processor performing a read 
from its cache is supplied with data corresponding to the most recent 
value written to memory or to another processor’s cache. 


Cast-Outs. Cache sectors that must be written to memory when a snoop miss 
causes the least recently used section with modified data to be 
replaced. 


Context Synchronization. All instructions in execution complete past the 
point where they can produce an exception; all instructions in 
execution complete in the context in which they began execution; all 
subsequent instructions are fetched and executed in the new context. 


Copy-Back Operations. A cache operation in which a cache line is copied 
back to memory to enforce cache coherency. Copy-back operations 
consist of snoop push-out operations and cache cast-out operations. 


Denormalized Number. A non-zero floating-point number whose exponent 
has a reserved value, usually the format's minimum, and whose 
explicit or implicit leading significand bit is zero. 


Dynamic Store Forwarding. Allows the FPU to collapse a floating-point 
arithmetic operation followed by a floating-point store operation that 
depends on the result of the arithmetic operation into a single 
operation through the pipeline. 
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Exception. An unusual or error condition encountered by the processor that 
results in special processing. 


Exception Handler. A software routine that executes when an exception 
occurs. Normally, the exception handler corrects the condition that 
caused the exception, or performs some other meaningful task (such 
as aborting the program that caused the exception). The addresses of 
the exception handlers are defined by a two-word exception vector 
that is branched to automatically when an exception occurs. 


Exclusive State. MESI state in which only one caching device contains data 
that is also in system memory. Note that in the MPC6(0)1, shared 
cache sectors are also described as shared exclusive, in that data is 
the same in both the cache and in external memory. 


Execution Synchronization. All instructions in execution are architecturally 
complete before beginning execution (appearing to begin execution) 
of the next instruction. Similar to context synchronization but doesn't 
force the contents of the instruction buffers to be deleted and 
refetched. 


Exponent. The component of a binary floating-point number that normally 
signifies the integer power to which two is raised in determining the 
value of the represented number. Occasionally the exponent is called 
the signed or unbiased exponent. 


Feed Forwarding. An MPC6()1 feature that reduces the number of clock 
cycles that an execution unit must wait to use a register. When the 
source register of the current instruction is the same as the 
destination register of the previous instruction, the result of the 
previous instruction is routed to the current instruction at the same 
time that it is written to the register file. With feed forwarding, the 
destination bus is gated to the waiting execution unit over the 
appropriate source bus, saving the cycles which would be used for 
the write and read. 


Floating-Point Unit. The functional unit in the MPC601 processor 
responsible for executing all floating-point instructions plus integer 
multiply and divided instructions. 


Flush. An operation that causes a modified cache sector to be invalidated and 
the data to be written to memory. 


Fraction. The field of the significand that lies to the right of its implied binary 
point. 
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G General-Purpose Registers. Any of the 32 registers in the MPC6()1 register 
file. These registers provide the source operands and destination 
results for all MPC601 data manipulation instructions. Load 
instructions move data from memory to registers, and store 
instructions move data from registers to memory. 


I IEEE 754. A standard written by the Institute of Electrical and Electronics 
Engineers that defines operations of binary floating-point arithmetic 
and representations of binary floating-point numbers. 


Instruction Unit. The functional unit in the MPC601 processor that fetches 
all instructions from memory and performs the initial stages of 
instruction decoding. The instruction unit also contains the branch 
processing unit and performs all instruction address calculations 
(including branch address calculations). 


Integer Unit. The functional unit in the MPC601 processor responsible for 
executing all instructions except floating point, integer multiply and 
divide, and change of flow instructions. 


Interrupt. An external signal that causes the MPC6(0)I to suspend current 
execution and take a predefined exception. 


Invalid State. MESI state (I) that indicates that the cache sector does not 
contain valid data. 


K Kill. An operation that causes a cache sector to be invalidated. 
L Latency. The number of clock cycles necessary to execute an instruction and 


make ready the results of that instruction. 


Little-Endian. A byte-ordering method in memory where the address n of a 
word corresponds to the least significant byte. In an addressed 
memory word, the bytes are ordered (left to right) 3, 2, 1, 0, with 3 
being the most significant byte. 


Livelock. A state in which processors interact in a way such that no processor 
makes progress. 


M Memory-Mapped Accesses. Accesses whose addresses use the segmented 
or block address translation mechanisms provided by the MMU and 
that occur externally with the bus protocol defined for memory. 
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Memory Coherency. Refers to memory agreement between caches in a 
multiple processor and system memory (e.g. MESI cache 
coherency). 


Memory Consistency. Refers to levels of memory with respect to a single 
processor and system memory (e.g. on-chip cache, secondary cache, 
and system memory). 


Memory-Forced I/O Controller Interface Access(BUID = x'07F'). These 
accesses are made to memory space. They do not use the extensions 
to the memory protocol described for I/O controller interface 
accesses, and they bypass the page- and block -translation and 
protection mechanisms. 


MESI Modified. MESI state in which one, and only one, caching device has 
the valid data for that address. The data at this address in external 
memory is not valid. 


NaN. Not a number; a symbolic entity encoded in floating-point format. 
There are two types of NaNs—signaling NaNs and quiet NaNs. 


No-Op. No-operation. A single-cycle operation that does not affect registers 
or generate bus activity. 


Overflow. An error condition that occurs during arithmetic operations when 
the result cannot be stored accurately in the destination register(s). 
For example, if two 32-bit numbers are added, the sum may require 
33 bits due to carry. Since the 32-bit registers of the MPC601 cannot 
represent this sum, an overflow condition occurs. 


Packet. It is used in the MPC6()1 with respect to [/O controller interface 
operations. 


Page. A4-Kbyte area of memory, aligned on a 4-Kbyte boundary. 


Park. The act of allowing a bus master to maintain mastership of the bus 
without having to arbitrate. 


Pipelining. A technique that breaks instruction execution into distinct steps 
so that multiple steps can be performed at the same time. 


Precise Exceptions. The pipeline can be stopped so the instructions that 
’ preceded the faulting instruction can complete, and subsequent 
instructions can be executed from scratch. The system is precise 
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unless one of the imprecise modes for invoking the floating-point 
enabled exception is in effect. 


Processor Clock. Internal P_CLOCK signal. 


Quiesce. To come to rest. The processor is said to quiesce when an exception 
is taken or a sync instruction is executed. The instruction stream is 
stopped at the decode stage and executing instructions are allowed to 
complete to create a controlled context for instructions that may be 
affected by out-of-order, parallel execution. See Synchronization. 


Quiet NaNs. Propagate through almost every arithmetic operation without 
signaling exceptions. These are used to represent the results of 
certain invalid operations, such as invalid arithmetic operations on 
infinities or on NaNs, when invalid. 


Register Renaming. The use of shadowing that allows a register to be 
updated by instructions that are executed out of order without 
destroying machine state information. The MPC601 implements a 
two-entry link register shadow to improve performance and to help 
handle the precise exception model required by PowerPC. 


Scan Interface. The MPC601’s test interface. 


Sector. One half of a MPC601 cache line. Each MPC601 cache line is 16 
words long; therefore, each sector is 8 words long. Cache coherency 
is maintained with sector granularity. In the MPC6()1, the sector is 
equivalent to a cache block. 


Shared State. MESI protocol state in which two or more caching devices 
contain the same information. In the MPC601, shared implies 
shared, exclusive. That is, shared data is identical to the data at that 
address in system memory. 


Signaling NaNs. Signal the invalid operation exception when they are 
specified as arithmetic operands 


Significand. The component of a binary floating-point number that consists 
of an explicit or implicit leading bit to the left of its implied binary 
point and a fraction field to the right. 


Slave. The device addressed by a master device. The slave is identified in the 
address tenure and is responsible for supplying or latching the 
requested data for the master during the data tenure. 
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Snooping. Monitoring addresses driven by a bus master to detect the need for 
coherency actions. 


Snoop Push. Write-backs due to a snoop hit. The sector may or may not 
transition to shared state. 


Split-Transaction. A transaction with independent request and response 
tenures. 


Split-Transaction Bus. A bus that allows address and data transactions from 
different processors to occur independently. 


Static Branch Prediction. Mechanism by which software (for example, 
compilers) can give a hint to the machine hardware about the 
direction the branch is likely to take. 


Superscalar Machine. A machine that can issue multiple instructions 
concurrently from a conventional linear instruction stream. 


Supervisor Mode. The privileged operation state of the MPC601. In 
supervisor mode, software can access all control registers and can 
access the supervisor memory space, among other privileged 
operations. 


Tenure. The period of bus mastership. For the MPC6()1, there can be separate 
address bus tenures and data bus tenures. A tenure consists of three 
phases: arbitration, transfer, termination 


Transaction. A complete exchange between two bus devices. A transaction 
is minimally comprised of an address tenure; one or more data 
tenures may be involved in the exchange. There are two kinds of 
transactions: address/data and address-only. 


Transfer Termination. Signal that refers to both signals that acknowledge 
the transfer of individual beats (of both single-beat transfer and 
individual beats of a burst transfer) and to signals that mark the end 
of the tenure. 


Underflow. An error condition that occurs during arithmetic operations when 
the result cannot be represented accurately in the destination register. 
For example, underflow can happen if two floating-point fractions 
are multiplied and the result is a single-precision number. The result 
may require a larger exponent and/or mantissa than the single- 
precision format makes available. In other words, the result is too 
small to be represented accurately. 





Glossary of Terms and Abbreviations Glossary-7 


Unified Cache. Combined data and instruction cache. 


User Mode. The unprivileged operating state of the MPC601. In user mode, 
software can only access certain control registers and can only 
access user memory space. No privileged operations can be 
performed. 


W Write-Through. A memory update policy in which all processor write cycles 
are written to both the cache and memory. 
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Cache organization, 4-2 
Cache sector line-fill operation, 4-5 
Cache sector push operation, 4-5, 4-17 
Cache unit 
memory performance, 7-25 
operation of the cache, 9-2 
overview, 4-1 
Cache-inhibited accesses (I bit) 
cache interactions, 4-7 
MMU (I-bit setting), 6-10, 6-16, 6-57, 6-61 
timing considerations, 7-26 
Change (C) bit maintenance 
recording, 6-8, 6-39, 6-40, 6-41 
updates, 6-56 
Checkstop signal, 9-47 
Checkstop sources and enables register (HIDO), 2-36 
Checkstop state, 5-21 
CI, 8-14 
CKSTP_IN, 8-25 
CKSTP_OUT, 8-26 
cles, 4-18, 10-25 
Clean block operation, 4-15 
Clock signals 
2X_PCLK, 8-31 
BCLK_EN, 8-32 
PCLK_EN, 8-31 
RTC, 8-35 
cmp, 10-26 
cmpi, 10-27 
cmpl, 10-28 
cmpli, 10-29 
entizd, C-4 
cntlzw, 10-30 
Coherency precautions, 4-11 
Complement register, simplified mnemonic, 3-93 
Context synchronization, 2-24, 3-2 
Copy-back made, 7-25 
CR (condition register) 
CR bit fields, 2-11 
CR settings, 3-39, B—2 
crand, 10-31 
crandc, 10-32 
creqv, 10-33 
crnand, 10-34 
crnor, 10-35 
cror, 10-36 
crorc, 10-37 
crxor, 10-38 
CSEO-CSE2 signals, 8-15, 9-27 
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Effective address calculation 

D address translation, 6-1 

: . branches, 3-2, 3-63 
DABR (data address breakpoint register, HIDS), 2-40 loads and sores. 322, 349-3655 
DAR (data address ea 2-27 eieio, 3-53, 4-20, 10-54 
Data access exception, ¢ -21 eqv, 10-55 
Data breakpoints (DABR), 2-40, 6-12, 6-13 Berar terminations 0-05 
Data bus ESP interface, 8-29 


arbitration signals, 8-19, 9-8 

bus arbitration, 9-21 

data tenure, 9-7, 9-37 

data transfer, 8-21, 9-22 

data transfer termination, 8-23, 9-23 
Data transfers, alignment, 2-43, 9-15, 9-17 
DBB, 8-20, 9-8, 9-22 


Exceptions 
alignment exception, 5-25 
asynchronous exceptions, 5-6, 5-7 
data access exception, 5-21 
decrementer exception, 5-45 
enabling and disabling, 5-13 


DBG. 8-19. 9-8 exception classes, 5-2 


DSWO. 8-19. 9-8. 9-50 exception priorities, 5-7 


DBWO exception processing, 5-9, 5-13 
debf. 4-20 st external interrupt, 5-25 
cbf, 4-20, 10-3 


en FP unavailable exception, 5-44 
nee ee I/O controller interface error, 5-46 
Ee. Oe. instruction access exception, 5-24 

, ’ machine check exception, 5-19 
dcbtst, 4-19, 10-43 7 


precise exceptions, 5-5 
dcbz, 4-19, 10-44 


iorities, 5-10 
Debug modes register (HID1), 2-38 abuts 


; panda program exception, 5-32 
DEC (decrementer register), 2-28, B-7 register settings 


Decode timing, 7-10 FPSCR, 5-33 
Decrementer exception, 5-45 MSR. 5-11. 5-15 
Defined instruction class, D-1 SRRO. SRRI. 5-10 


DH0-DH31/DL0-DL3 1, 8-21 

Direct address translation (translation disabled) 
data accesses, 6-7, 6-8, 6-16, 6-24, 6-34 
instruction accesses, 6-7, 6-8, 6-16, 6-24, 6-34 


reset, 5-16 

run mode exception, 5-48 
summary, 3-3 

summary table, 5-2 


div, 10-45 synchronous/precise, 5-6 
divd, C-4 system call exception, 5-47 
divdu, C-5 vector offset table, 5-2, 5-16 
divs, 10-46 Execute timing, instruction, 7-12 
divw, 10-47 Execution units, 1-6 
divwu, 10-49 External control instructions, 3-90 
Double-speed processor clock (2X_PCLK), 8-31 extsb, 10-56 
doz, 10-50 extsh, 10-57 
dozi, 10-51 extsw, C-6 
DPO0—DP7, 8-22 
DPE, 8-23 
DATAY, 8-24, 9-23, 9-25 F 
DSISR (DAE/source instruction service register) fabs, 10-58 
format, 2-27 fadd, 10-59 
settings for alignment exception, 5-30 fcfid, C-7 
settings for DAE, 5-21 fempo, 10-60 
fempu, 10-61 
E fetid, C-7 
fctidz, C-8 
EAR (external access register), 2-31 fctiw, 10-62 
eciwx, 10-52 fctiwz, 10-63 





MOTOROLA Index Index-3 


INDEX 


fdiv, 10-64 Hashed page tables, 6-41 
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Feed forwarding, 7-3 primary PTEG, 6-45, 6-52, 6-55 
Floating-point instructions secondary PTEG, 6-45, 6-53, 6-56 
data formats, 2-59 HID registers, 2-35 
floating-point models, F-2 HP_SNP_REQ, 8-16 
IEEE-754 compatability, 2-55 HRESET, 8-26 
precision handling, 2-66 
rounding, 2-68 | 


Floating-point model 
compare instructions, -3-39 
FP arithmetic instructions, 3-30 
FP exception mode bits, 5-12 
FP multiply-add instructions, 3-34 


FP unavailable exception, 5-44 ae 
FPSCR instructions, 3-40 address and data tenures, 9-37 


detailed description, 9-41 
load access, timing, 9-46 
load operations, 9-40 
store access, timing, 9-47 
store operations, 9-39 
transactions, 9-38 
XATS signal, 9-37 
I/O controller interface error exception, 5-46 
memory-forced accesses, 6-61 
no-op instructions, 6-62 
operations, 8-8 


I/O controller interface 
address translation, 6-59 
alignment exception, 5-27 
architectural ramifications of accesses, 9-36 
bus protocol 


program exceptions, 5-33 

rounding and conversion instructions, 3-37 
Floating-point numbers, conversion, F-1 
Floating-point unit 

execution timing, 7-22 

overview, 1-7 
Flow control instructions, 3-63 

branch instructions, 3-74 

condition register logical, 3-75 

system linkage, 3-76 
Flush block operation, 4-15 


fmadd, 10-65 protection, 6-60 

fmr, 10-66 selection of I/O controller interface segments, 6- 
fmsub, 10-67 35 

fmul, 10-68 unsupported instructions, 6-61 

fnabs, 10-69 I/O tenures, 9-38 

fneg, 10-70 IABR (instruction address breakpoint register, HID2), 
fnmadd, 10-71 ae 2-39 

fnmsub, 10-73 icbi, 4-21, 10-77 <5 Ls 

FPCC (floating-point condition code), 3-39 IEEE 1149.1-compatible interface, 9-49 

FPRO-FPR31 (floating-point registers), 2-6 Iilegal instruction class, D-2_ 

FPSCR (floating-point status and control register), 2-7 Imprecise exceptions, 5-7, 5-9 

FPSCR instructions, 3-40 Instruction 

fres, C-9 stmw, 10-192 

frsp, 10-75 Instruction access exception, 5-24 

frsqrte, C-10 Instruction breakpoints (IABR), 2-39, 6-12, 6-13 

fsel, C-11 Instruction flow, 7-4 

fsqrt, C-11 Instruction prefetch 

fsub, 10-76 MMU constraints, 6-1 1 


Instruction queue, 1-6, 7-4 
Instruction stages, 7-5 


G Instruction timing 

GBL, 8-15 instruction flow, 7-4 

GPRO-GPR31 (general purpose registers), 2-6 instruction queue, 7-4 

Guarded memory, 6-12 instruction stages, 7-5 
overview, 7-1 

H timing considerations, 7-2 


Instruction TLB (ITLB), 6-15 
Instruction unit, 1-5 


Hardware consideratons, MESI, 4-10 
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abs, 10-7 eqv, 10-55 
add, 10-8 external control, 3-90 
adde, 10-9 extsb, 10-56 
adde, 10-10 extsh, 10-57 
addi, 3-92, 10-11 extsw, C-6 
addic, 10-12, 10-13 fabs, 10-58 
addis, 3-92, 10-14 fadd, 10-59 
addme, 10-15 fctid, C-7 
addze, 10-16 fempo, 10-60 
and, 10-17 fempu, 10-61 
andc, 10-18 fctid, C-7 
andi, 10-19 fetidz, C-8 
andis., 10-20 fctiw, 10-62 
b, 10-21 fctiwz, 10-63 
be, 10-22 fdiv, 10-64 
bectr, 10-23 floating-point 
belr, 10-24 arithmetic, 3-30 
branch address calculation, 3-63 compare, 3-30), 3-39 
branch instructions, 3-74 double-precision conversion, load, 3-59 
cache management instructions, 4-17 double-precision conversion, store, 3-61 
classes of instructions, D-1 FP status and control register, 3-40 
cles, 4-18, 10-25 multiply-add, 3-30, 3-34 
cinp, 10-26 rounding and conversion, 3-30, 3-37 
empi, 10-27 status and control register, 3-30 
cmpl, 10-28 floating-point models, F-2 
cmpli, 10-29 flow control, 3-1, 3-63 
cntlzd, C-4 fmadd, 10-65 
entlzw, 10-30 fmr, 10-66 
condition register logical, 3-75 fmsub, 10-67 
crand, 10-3} fmul, 10-68 
crandc, 10-32 fnabs, 10-69 
creqv, 10-33 fneg, 10-70 
crnand, 10-34 fnmadd, 10-71 
crnor, 10-35 fnmsub, 10-73 
cror, 10-36 fres, C-9 
crore, 10-37 frsp, 10-75 
crxor, 10-38 frsqyrte, C-10 
dcbf, 4-20, 10-39 fsel, C-11 
dchi, 10-40 fsqrt, C-11 
debst, 4-19, 10-41 fsub, 10-76 
debt, 4-18, 10-42 icbi, 4-21, 10-77 
debtst, 4-19, 10-43 illegal instructions, D-2 
dcbz, 4-19, 10-44 integer 
detined instructions, D-1 arithmetic, 3-4 
div, 10-45 compare, 3-4, 3-15 
divd, C-4 logical, 3-4, 3-16 
divdu, C-S rotate, 3-20 
divs, 10-46 rotate and shift, 3-4, 3-18, 3-19 
divw, 10-47 shift, 3-20 
divwu, 10-49 invalid forms, D-2 
doz, 10-50 isyne, 3-53, 4-21, 10-78 
dozi, 10-St latency summary, 7-26 
eciwx, 3-90, 10-52 Ibz, 10-79 
ecowx, 3-90, 10-53 Ibzu, 10-80 
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Ibzx, 10-82 

Id, C-13 

Idarx, C-13 

Idu, C-14 

Idux, C-15 

Idx, C-15 

Ifd, 10-83 

Ifdu, 10-84 

Ifdux, 10-85 

Ifdx, 10-86 

Ifs, 10-87 

lfsu, 10-88 

lfsux, 10-89 

Ifsx, 10-90 

Iha, 10-91 

Ihau, 10-92 

Ihaux, 10-93 

hax, 10-94 

Ihbrx, 10-95 

Ihz, 10-96 

thzu, 10-97 

Ihzux, 10-98 

Ihzx, 10-99 

Imw, 10-100, B—4 

load and store 
address generation, floating-point, 3-55 
address generation, integer, 3-42 
byte reversal instructions, 3-48 
double-precision conversion for FP load, 3- 

59 
double-precision conversion for FP store, 3- 
61 

floating-point load, 3-57 
floating-point move, 3-62 
floating-point store, 3-60 
integer load, 3-44 
integer multiple, 3-49 
integer store, 3-47 
move multiple, 3-50 

Iscbx, 10-101 

Iswi, 10-103, B4 

Iswx, 10-104, B~4 

Iwa, C-16 

Iwarx, 3-53, 10-105 

Iwaux, C-16 

Iwax, C-17 

Iwbrx, 10-106 

lwz, 10-107 

Iwzu, 10-108 

Iwzux, 10-109 

Iwzx, 10-110 

maskg, 10-111 

maskir, 10-112 
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merfs, 10-114 

merxr, 10-115 

memory control, 3-85 
mfcr, 10-117 

mffs, 10-118 

mfmsr, 10-119, B-1 
mfspr, 3-80, 10-120, B-5S 
mfsr, 10-123, B-1 
mfsrin, 10-124 

mftb, C-17 

mterf, 10-125 

mtfsb0, 10-126 

mtfsb1, 10-127 

mtfsf, 10-128 

mtmsr, 10-130 

mtspr, 3-80, 10-131, B-5 
mtsr, 10-133 

mtsrin, 10-134 

mul, 10-135 

mulhd, C-18 

mulhdu, C-19 

mulhw, 10-136 

mulhwu, 10-137 

mulld, C-19 

mulli, 10-139 

mullw, 10-138 

nabs, 10-140 

nand, 10-141 

neg, 10-142 

no-op, 3-92 

nor, 10-143 

or, 10-144 

orc, 10-145 

ori, 10-146 

oris, 10-147 

POWER instructions in PowerPC, B—-9 
POWER instructions, deleted, B-8 
PowerPC instructions, list, A-1 
processor control, 3-1, 3-80 
reserved bits, B~1 
reserved instructions, D-3 
rfi, 10-148 

rldcl, C-20 

tidcr, C-21 

ridic, C-21 

ridicl, C-22 

ridicr, C-23 

ridimi, C-23 

rlmi, 10-149 

rlwimi, 10-150 

rlwinm, 10-151 

rlwam, 10-152 

trib, 10-153 
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sc, 10-154, B-4 
segment register instructions, B-6 
seginent register manipulation, 3-89 
slbia, C-24 
slbie, C-25 
slbiex, C-26 
sld, C-26 

sle, 10-156 
sleq, 10-157 
sliq, 10-158 
slliq, 10-159 
sllq, 10-160 
slq, 10-161 
slw, 10-162 
srad, C-27 
sradi, C-28 
sraiq, 10-164 
sraq, 10-163 
straw, 10-165 
srawi, 10-166 
std, C-29 

sre, 10-167 
srea, 10-168 
steq, 10-169 
stliq, 10-171 
stlq, 10-172 
stq, 10-173 
srw, 10-174 
stb, 10-175 
stbu, 10-176 
stbux, 10-177 
stbx, 10-178 
std, C-29 
stdcx., C-30 
stdu, C-31 
stdux, C-31 
stdx, C-32 
stfd, 10-179 
stfdu, 10-180 
stfdux, 10-181 
stfdx, 10-182 
stfiwx, C-32 
stfs, 10-183 
stfsu, 10-184 
stfsux, 10-185 
stfsx, 10-186 
sth, 10-187 
sthbrx, 10-188 
sthu, 10-189 
sthux, 10-190 
sthx, 10-191 
stswi, 10-193 
stswx, 10-194 
stw, 10-195 
stwbrx, 10-196 
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stwex., 3-53, 10-197 
stwu, 10-198 
stwux, 10-199 
stwx, 10-200 

subf, 10-201 

subfc, 10-202 
subfe, 10-203 
subfic, 10-204 
subfme, 10-205 
subfze, 10-206 


supervisor-level cache management, 3-85 


support for lwarx/stwex., 9-48 
sync, 3-53, 10-207 

td, C-33 

tdi, C-34 

TLB management, 3-90 
tlbia, C-34 

tlbie, 3-90, 10-208, B-7 
tlbiex, C-35 

tlbsync, C-36 

trap, 3-78 

tw, 10-210 

twi, 10-211 


unimplemented by MPC601, 32-bit, C-1 
unimplemented by MPC601, 64-bit, C-2 


word compare mnemonics, 3-15 

xor, 10-212 

xori, 10-213 

xoris, 10-214 
INT, 8-25, 9-47 
Integer arithmetic instructions, 3-4 
Integer compare instructions, 3-15 
Integer load instructions, 3-44 
Integer logical instructions, 3-16 


Integer rotate and shift instructions, 3-18, 3-19 


Integer store instructions, 3-47 
Integer unit 
execution timing, 7-19 
overview, 1-7 
Interrupt, external, 5-25 
isync, 3-53, 4-21, 10-78 
IU, 3-4 


K 


Key (Ks, Ku) protection bits, 6-19 
Kill block operation, 4-15 


L 

Latency, 7-1, 7-26, 9-22 
Ibz, 10-79 

Ibzu, 10-80 

Ibzux, 10-81 

Ibzx, 10-82 
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Idu, C-14 

Idux, C-15 
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lfd, 10-83 

Ifdu, 10-84 

Ifdux, 10-85 

Ifdx, 10-86 

Ifs, 10-87 

Ifsu, 10-88 

Ifsux, 10-89 

lfsx, 10-90 

Iha, 10-91 

Ihau, 10-92 

Ihaux, 10-93 

Ihax, 10-94 

Ihbrx, 10-95 

lhz, 10-96 

Ihzu, 10-97 

Ihzux, 10-98 

lhzx, 10-99 

Little-endian byte ordering, 2-45 

Imw, 10-100, B~4 

Load address, simplified mnemonic, 3-93 

Load immediate, simplified mnemonic, 3-92 

Load operations 
I/O load accesses, 9-40 
memory coherency actions, 4-13 

Load/store 
address generation, 3-42 
byte reverse instructions, 3-48 
floating-point load instructions, 3-57 
floating-point move instructions, 3-62 
floating-point store instructions, 3-60 
integer load instructions, 3-44 . 
integer store instructions, 3-47 
load/store multiple instruction, 3-49 
memory synchronization instructions, 3-53 
move multiple instructions, 3-50 

Logical addresses 
translation into physical addresses, 6-1 

LR (link register), 2-18 

Iscbx, 10-101 

Iswi, 10-103, B-4 

Iswx, 10-104, B4 

Iwa, C-16 

Iwarx, 3-53, 10-105 

Iwarx/stwcx. 
general information, 4-14, G-1 
support, 9-48 

Iwaux, C-16 

Iwax, C-17 

Iwbrx, 10-106 
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Iwz, 10-107 
Iwzu, 10-108 
Iwzux, 10-109 
Iwzx, 10-110 
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Machine check exception, 2-36, 5-19 
maskg, 10-111 
maskir, 10-112 
merf, 10-113 
merfs, 10-114 
merar, 10-115 
Memory accesses, 9-4 
Memory coherency bit (M bit) 
MMU (M-bit setting), 6-16 
cache interactions, 4-7 
coherency in multiprocessor systems, 4-12 
MMU (M-bit setting), 6-10, 6-57, 6-61 
timing considerations, 7-25 
Memory control instructions 
cache management, 3-85, 3-86 
segment register manipulation, 3-89 
TLB management, 3-90 
Memory management unit 
address translation flow, 6-8 
address translation mechanisms, 6-5, 6-21 
block address translation, 6-5, 6-8, 6-24 
block diagram, 6-3 
direct address translation, 6-7, 6-8, 6-16, 6-24, 6- 
34 
exceptions, 6-12 
hashing functions, 6-45 
instruction TLB (ITLB), 6-15 
instructions and registers, 6-14 
memory protection, 6-7, 6-19, 6-31 
memory/cache access modes (WIM bits), 6-10 
overview, 1-8, 6 2 
page address translation, 6-5, 6-8, 6-35, 6-41 
page history status, 6-8, 6-39 
page table search operation, 6-53 
page tables in memory, 6-41 
segment model, 6-31 
virtual address (52-bit), 6-35 
Memory synchronization 
eieio, 3-53 
isync, 3-53 
Iwarx, 3-53 
stwex., 3-53 
sync, 3-53 
Memory unit 
bus interface logic, 4-25 
operation for loads and stores, 9-4 
overview, 1-9, 4-22 
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queuing structure, 4-14, 4-24 
Memory update modes 
copy-back mode, 7-25 
Memory/cache access modes, purpose, 7-25 
Memory/cache access modes, see WIM bits 
MESI protocol 
definition, MESI states, 1-21, 4-8 
enforcing memory coherency, 9-27 
hardware considerations, 4-10 
MESI state definitions, 4-8 
mfcr, 10-117 
mffs, 10-118 
mfmsr, 10-119, B-1 
mfspr, 3-80, 10-120 
mfspr, POWER and PowerPC, B-S 
mfsr, 10-123, B-1 
mfsrin, 10-124 
mftb, C-17 
Misaligned data transfer, 9-17 
Move register, simplified mnemonic, 3-93 
MQ register, 2-14, 3-4 
MSR (machine state register), 2-20 
mtcrf, 10-125 
mtfsbO, 10-126 
mtfsb1, 10-127 
mtfsf, 10-128 
mtmsr, 10-130 
mtspr, 3-80, 10-131 
mtspr, POWER and PowerPC, B-5 
mtsr, 10-133 
mtsrin, 10-134 
mul, 10-135 
mulhd, C-18 
mulhdu, C-19 
mulhw, 10-136 
mulhwu, 10-137 
mulld, C-19 
mulli, 10-139 
mullw, 10-138 
Multiple-precision shifts, 3-20, E-1 
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nabs, 10-140 

nand, 10-141 

neg, 10-142 

No-op, 3-92 

nor, 10-143 

Normal termination, 9-23 
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Operand placement and performance, 2-42 
Operating environment architecture, 1-11 
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or, 10-144 

orc, 10-145 

ori, 10-146 

oris, 10-147 

Out-of-order instruction issue, 7-13 


P 


Page address translation 
generation of physical addresses, 6-35 
page address translation flow, 6-41 
page memory protection, 6-19, 6-40 
page size, 6-31 
page tables in memory, 6-41 
segment registers, 6-33, 6-36 


selection of page address translation, 6-5, 6-34 


table search operation, 6-53 
UTLB organization, 6-33 


virtual address and virtual segment ID, 6-35 


Page history status 
R and C bit recording, 6-8, 6-39 
R and C bit updates, 6-56 
Page tables 
allocation of PTEs, 6-49 
example table structures, 6-49, 6-51 
organized as PTEGs, 6-43 
page table size, 6-44 
page table updates, 6-56 
PTE format, 6-37 
PTEG addresses, 6-47, 6-51 
table search for PTE, 6-53 
PCEK_EN, 8-31 
Performance considerations, memory, 7-25 
Physical address generation 
block physical address generation, 6-29 


generation of PTEG addresses, 6-47, 6-51 


memory management unit, 6-1 
page physical address generation, 6-35 


PIR (processor identification register, HID15), 2-41 


POWER architecture 
deleted instructions in PowerPC, B-8 
migraton to PowerPC, B-1 
POWER instructions in PowerPC, B-9 


POWER/PowerPC, incompatibilities, B—1 


svcx instruction, B-4 
PowerPC architecture 
features used in MPC601, 1-13 
instructions, A-1 
levels of implementation, 1-11 
operating environment architecture, 1-11 


POWER/PowerPC, incompatibilities, B—1 


registers/implementation, 1-14 
user instruction set architecture, 1-11 
virtual environment architecture, 1-11 
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PP protection bits, 6-19 

Precise exceptions, 5-5, 5-9 

Prefetch timing, 7-6 

Priorities 
cache access priorities, 4-4 
exception priorities, 5-7, 5-10 
memory unit queuing priorities, 4-24 

Privilege levels 
changing privilege levels, 2-20, 5-14 
supervisor-level cache instruction, 3-85 
supervisor-level registers, 2-20 
user-level cache instructions, 3-86 
user-level registers, 2-6 

Process switching, 5-14 

Processor control instructions, 3-80 

Program exception, 5-32 

Programming model 
supervisor-level registers, 2-20 
user-level registers, 2-6 

Protection of memory areas 
block access protection, 6-19, 6-29, 6-31 
I/O controller interface protection, 6-8, 6-61 
options available, 6-7, 6-19 
page access protection, 6-19, 6-31, 6-40 
programming protection bits, 6-19 
protection violations, 6-12, 6-20, 6-31 

PTEGs (PTE groups) 
definition, 6-43 
example primary and secondary PTEGs, 6-51 
generation of PTEG addresses, 6-47 
table search operation, 6-53 

PTEs (page table entries), 6-35, 6-37, 6-43, 6-53, 6-56 

PVR (processor version register), 2-33 
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Qualified bus grant, 9-8 

Qualified data bus grant, 9-21 
Qualified snoop request, 4-14 
Queuing structure, memory unit, 4-24 
QUIESC_REQ, 8-28 
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Read (with clean) operations, 4-13 
Read atomic operation, 4-15 
Read operation, 4-15 
Read with intent to modify operation, 4-15 
Real-time clock (RTC), B-7 
Reference (R) bit maintenance 
recording, 6-8, 6-39, 6-40, 6-54 
updates, 6-56 
Registers 
DEC, B-7 
PowerPC implementation, 1-14 
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reserved bits, B-2 
supervisor-level 
MSR, 2-20 
SR, 2-22 
supervisor-level SPRs 
BATs, 2-33 
DAR, 2-27 
DEC, 2-28 
DSISR, 2-27 
EAR, 2-31 
HID Registers, 2-35 
PVR, 2-33 
SDR1, 2-29 
SPRGO-SPRG3, 2-31 
SRRO, 2-30 
SRR1, 2-30 
user-level 
CR, 2-11 
FPRO-FPR31, 2-6 
FPSCR, 2-7 
GPRO-GPR31, 2-6 
user-level SPRs 
CTR, 2-19 
LR, 2-18 
MQ, 2-14 
RTC, 2-16 
XER, 2-15 
Reserved instruction class, D-3 
Reset 
hard reset, 2-71, 5-17 
register state after reset, 2-71 
reset exception, 5-16 
soft reset, 2-72, 5-17 
Reset signals 
HRESET, 8-26, 9-48 
QUIESC_REQ, 8-28 
RESUME, 8-27 
FSRV, 8-28 
SC_DRIVE, 8-28 
SRESET, 8-27, 9-48 
SY5_QUIESC, 8-27 
RESUME, 8-27 
rfi, 10-148 
tidcl, C-20 
rider, C-21 
ridic, C-21 
rldicl, C-22 
ridicr, C-23 
ridimi, C-23 
tlmi, 10-149 
rlwimi, 10-150 
rlwinm, 10-151 
tlwnm, 10-152 
Rotate and shift operations, 3-18 
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Rounding, floating-point operations, 2-68 
trib, 10-153 
RSRV, 8-28, 9-49 
RTC (real time clock) 
RTC facility, 2-16 
signal, 8-35 
Run mode exception, 5-48 


S 


sc, 5-47, 10-154 
SC_DRIVE, 8-28 
SDR1 (table search description) register 
format, 2-29, 6-44 
generation of PTEG addresses, 6-47, 6-51 
Segment registers 
format, 2-22, 6-36, 6-60 
instructions, 6-37, B-6 
SR manipulation instructions, 3-89 
T-bit, 2-22, 6-34, 9-35 
updates, 2-24 
Segmented memory model, see Memory management 
unit 
SHD, 8-18 
Signals 
2X_PCLK, 8-31 
A0-A3 1, 8-7 
BACK, 8-16 
ABB, 8-5, 9-8 
address arbitration, 9-8 
address transfer, 9-12 
address transfer attribute, 9-13 
APO-AP3, 8-8 
APE, 8-9 
ARTRY, 8-17, 9-23 
BCLK_EN, 8-32 
BG, 8-4, 9-8 
BR, 8-4, 9-8 
checkstop, 9-47 
CI, 8-14 
CKSTP_IN, 8-25 
CKSTP_OUT, 8-26 
configuration, 8-2 
CSEO-CSE2, 8-15, 9-27 
data arbitration, 9-8, 9-21 
data transfer termination, 9-23 
DEB, 8-20, 9-8, 9-22 
DEG, 8-19, 9-8 
DBWO, 8-19, 9-8, 9-50 
DHO-DH31/DLO0-DL31, 8-21 
DPO-DP7, 8-22 
DPE, 8-23 
DRTRY, 8-24, 9-23, 9-25 
ESP interface, 8-29 
GBL, 8-15 
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HP_SNP_REQ, 8-16 
HRESET, 8-26 
INT, 8-25, 9-47 
PCLK_EN, 8-31 
QUIESC_REQ, 8-28 
reset, 9-48 
RESUME, 8-27 
RSRV, 8-28, 9-49 
RTC (real time clock), 8-35 - 
SC_DRIVE, 8-28 
SHD, 8-18 
soft stop control, 9-48 
SRESET, 8-27, 9-48 
SYS_QUIESC, 8-27 
TA, 8-23, 9-23 
TBST, 8-13, 9-22 
TCO-TC1, 8-14, 9-17 
TEA, 8-24, 9-23, 9-25 
TS, 8-6 
TSIZO-TSIZ2, 8-12, 9-14 
TTO-TT4, 8-10, 9-13 
WT, 8-15 
XATS, 8-6, 9-37 
Simplified mnemonics, 3-91 
Single-beat reads with data delays, timing, 9-32 
Single-beat transfer 
back-to-back, timing, 9-33 
reads with data delays, timing, 9-31 
reads, timing, 9-29 
termination, 9-23 
writes, timing, 9-30 
slbia, C-24 
slbie, C-25 
slbiex, C-26 
sld, C-26 
sle, 10-156 
sleq, 10-157 
sliq, 10-158 
slliq, 10-159 
silq, 10-160 
slq, 10-161 
slw, 10-162 
Snoop operation, 4-14, 7-25, 9-19 
Snoop status signals, 4-14 
Soft stop control signals, 9-48 
Split-bus transaction, 9-9 
SPR encodings, unimplemented in MPC601, C-1, C-3 
SPRGO-SPRG3 (general SPRs), 2-31 
SR (segment register), 2-22 
srad, C-27 
sradi, C-28 
sraiq, 10-164 
sraq, 10-163 
sraw, 10-165 
srawi, 10-166 
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std, C-29 

sre, 10-167 

srea, 10-168 

sreq, 10-169 

SRESET, 8-27 

srliq, 10-171 

stlq, 10-172 

srq, 10-173 

SRRO/SRR1 (status save/restore registers), 2-30 

srw, 10-174 

Static branch prediction, 7-15 

stb, 10-175 

stbu, 10-176 

stbux, 10-177 

stbx, 10-178 

std, C-29 

stdcx., C-30 

stdu, C-31 

stdux, C-31 

stdx, C-32 

stfd, 10-179 

stfdu, 10-180 

stfdux, 10-181 

stfdx, 10-182 

stfiwx, C-32 

stfs, 10-183 

stfsu, 10-184 

stfsux, 10-185 | 

stfsx, 10-186 

sth, 10-187 

sthbrx, 10-188 

sthu, 10-189 

sthux, 10-190 

sthx, 10-191 

stmw, 10-192 

Store operations 
I/O operations to BUC, 9-39 
memory coherency actions, 4-14 
single-beat writes, 9-30 

stswi, 10-193 

stswx, 10-194 

stw, 10-195 

stwbrx, 10-196 

stwex., 3-53, 10-197 

stwu, 10-198 

stwux, 10-199 

stwx, 10-200 

subf, 10-201 

subfc, 10-202 

subfe, 10-203 

subfic, 10-204 

subfme, 10-205 

subfze, 10-206 

Supervisor mode, see privilege levels 


INDEX 


syne, 3-53, 10-207 

sync operation, 4-15 

Synchronization 
context, 2-24 
memory synchronization instructions, 3-53 
syne, B-S 

Synchronous/precise exceptions, 5-6 


System call exception, 5-47 
System linkage instructions, 3-76 
System status 


, 8-25 


QUIESC_REQ, 8-28 
RESUME, 8-27 
RSRV, 8-28 
SC_DRIVE, 8-28 
SRESET, 8-27 
SYS_QUIESC, 8-27 


T 


TA, 8-23, 9-23 
Table search operations 
algorithm, 6-53 
hashing functions, 6-45 
page table definition, 6-43 
SDRI register, 6-44 
table search flow (primary and secondary), 6-54 
TBST, 8-13, 9-22 
TCO-TCI signals, 8-14, 9-17 
td, C-33 
tdi, C-34 
TEA, 8-24, 9-25 
Termination, 9-18, 9-23 
Test signals, 8-30 
Timer facilities, B—7 
Timing diagrams, interface 
address transfer signals, 9-12 
back-to-back single-beat transfers, 9-33 
burst transfers with data delays, 9-34 
I/O controller interface load access, 9-46 
I/O controller interface store access, 9-47 
single-beat reads, 9-29 
single-beat reads with data delays, 9-31 
single-beat writes, 9-30 
single-beat writes with data delays, 9-32 
use of TEA, 9-35 
using DBWO, 9-50 
Timing, instruction 
BPU execution timing, 7-14 
cache arbitration, 7-6 
cache hit, 7-7 
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cache miss, 7-8 
decode timing, 7-10 
FPU execution timing, 7-22 
instruction execute timing, 7-12 
IU execution timing, 7-19 
prefetch timing, 7-6 
writeback timing, 7-14 
TLB invalidate 
TLB invalidate broadcast operations, 6-14, 6-15, 
6-57 
TLB management instructions, 3-90 
tlbie instruction, 6-14, 6-15, 6-57 
tlbia, C-34 
tlbie, 3-90, 10-208 
tlbie, POWER and PowerPC, B-7 
tlbiex, C-35 
tlbsyne, C-36 
tlbsync instruction emulation, 6-57 
TO operand, 3-78 
Transactions, cache, 4-5 
Transfer, 9-12, 9-22 
Trap instructions, 3-78 
TS, 8-6, 9-12 
TSIZO-TSIZ2 signals, 8-12, 9-14 
TTO-TT4, 8-10, 9-13 
tw, 10-210 
twi, 10-211 


U 


Unimplemented instructions in MPC601, C-1 
Use of TEA, timing, 9-35 

User instruction set architecture, 1-11 

User mode, see privilege levels 

Using DBWO, timing, 9-50 

UTLB, 6-33, 6-39 


V 


Vector offset table, exception, 5-2, 5-16 
Virtual address (52-bit) 
logical to virtual to physical address translation, 
6-35 
Virtual environment architecture, 1-11 
Virtual memory implementaticn, 6-2 


W 

WIM bits, 4-7, 6-10, 6-16, 6-57, 6-61, 9-27 
Word compare mnemonics, 3-15 

Write with atomic operation, 4-15 

Write with flush operation, 4-15 

Write with kill operation, 4-15 

Write-back mode, 6-17 

Writeback timing, 7-14 

Write-through (W bit), 6-10 
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Write-through mode (W bit) 
cache interactions, 4-7 
MMU (W-bit setting), 6-16, 6-57, 6-61 
timing considerations, 7-26 
Write-with-flush operations, 4-13 


WT, 8-15 


X 


XATS signal, 8-6, 9-37 

XER (integer exception register), 2-15 
xor, 10-212 

xori, 10-213 

xoris, 10-214 
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Overview 

Registers and Data Types 

Addressing Modes and Instruction Set Summary 
Cache and Memory Unit Operation 

Exceptions 

Memory Management Unit 

Instruction Timing 

Signal Descriptions 

system Interface Operation 

Instruction Set 

MPC601 Instruction Set 

POWER Architecture Cross Reference 
PowerPC Instructions Not Implemented in MPC601 
Classes of Instructions 

Multiple—Precision Shifts 

Floating—Point Models 


synchronization Programming Examples 
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